SYSTEMS 

RESEARCH 

CENTER 


Supported  bp  the 
National  Science  Foundation 
Engineering  Research  Center 
Program  (NSFD  CD  8803012), 
the  University  of  Maryland, 
Harvard  University, 
and  Industry 


Multiple  Frequency  Estimation 
in  Mixed-Spectrum  Time  Series 
by  Parametric  Filtering 


by  T-H.  Li 
Advisor:  B.  Kedem 


Ph.D.  92-7 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

1992 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1992  to  00-00-1992 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Multiple  Frequency  Estimaiton  in  Mixed- Spectrum  Time  Series  by 

5b.  GRANT  NUMBER 

raimiicuit  riiieiiiig 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROIECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  Maryland, The  Graduate  School, 2123  Lee  Building, College 
Park, MD, 20742 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

see  report 

15.  SUBIECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 
OF  PAGES 

155 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Abstract 


Title  of  Dissertation:  MULTIPLE  FREQUENCY  ESTIMATION 

IN  MIXED-SPECTRUM  TIME  SERIES 
BY  PARAMETRIC  FILTERING 


Ta-Hsin  Li,  Doctor  of  Philosophy,  1992 


Dissertation  directed  by:  Dr.  Benjamin  Kedem 

Mathematics  Department  and 
Systems  Research  Center 


A  general  parametric  filtering  procedure  (the  PF  method)  is  proposed  for  the 
problem  of  multiple  frequency  estimation  in  mixed-spectrum  times  series  (i.e.,  super¬ 
imposed  sinusoids  in  additive  noise).  The  method  is  based  on  the  fact  that  a  sum 
of  sinusoids  satisfies  an  homogeneous  autoregressive  (AR)  equation.  The  gist  of  the 
method  is  to  parametrize  a  linear  filter  so  that  it  possesses  a  certain  parametrization 
property  as  suggested  by  the  particular  form  of  the  bias  encountered  by  Prony’s  (least 
squares)  estimator.  For  any  parametric  filter  with  this  property,  in  addition  to  some 
mild  regularity  conditions,  the  least  squares  estimator  from  the  filtered  data,  as  a  func¬ 
tion  of  the  filter  parameter,  constitutes  a  contractive  mapping  —  whose  multivariate 
fixed-point  serves  as  a  consistent  AR  estimator.  The  chronic  bias  of  Prony’s  estimator 
is  thus  eliminated.  Coupled  with  the  all-pole  (AR)  filter  endowed  with  an  extra  band¬ 
width  parameter,  the  PF  method  can  achieve  the  accuracy  of  nonlinear  least  squares 


by  a  simple  iterative  procedure  consisting  of  linear  least  squares  estimation  followed 
by  linear  recursive  filtering.  Crude  initial  guesses  such  as  those  from  Prony’s  estima¬ 
tor  are  sufficient  to  initiate  the  iteration.  The  method  is  also  capable  of  resolving 
closely- spaced  frequencies  which  are  unresolvable  by  periodogram  analysis  or  DFT. 

To  analyze  the  statistical  properties  of  the  PF  method,  some  classical  asymptotic 
results  concerning  the  sample  autocovariances  are  extended  to  accommodate  mixed- 
spectrum  time  series  and  parametric  filtering.  In  particular,  under  regularity  condi¬ 
tions,  uniform  strong  consistency  and  asymptotic  normality  are  proved  for  the  sample 
autocovariances  of  a  mixed-spectrum  time  series  after  parametric  filtering.  Equipped 
with  these  results,  some  statistical  properties  of  the  PF  method  itself  are  investigated. 
These  include  the  existence  of  the  PF  estimator  as  a  fixed-point  of  the  parametric 
least  squares  mapping,  the  convergence  of  an  iterative  algorithm  that  calculates  the 
PF  estimator,  as  well  as  the  strong  consistency  and  asymptotic  normality  of  the  PF 
estimator. 

Computer  simulations  are  also  presented  to  demonstrate  the  effectiveness  of  the 
PF  method.  Directions  for  future  research  are  briefly  discussed  at  the  end  of  the 
dissertation. 


MULTIPLE  FREQUENCY  ESTIMATION 
IN  MIXED-SPECTRUM  TIME  SERIES 
BY  PARAMETRIC  FILTERING 


by 

Ta-Hsin  Li 


Dissertation  submitted  to  the  Faculty  of  the  Graduate  School 
of  the  University  of  Maryland  in  partial  fulfillment 
of  the  requirements  for  the  degree  of 
Doctor  of  Philosophy 
1992 


Advisory  Committee: 

Dr.  Benjamin  Kedem,  Chairman/ Advisor 

Dr.  Eric  V.  Slud 

Dr.  Piotr  W.  Mikulski 

Dr.  Harland  M.  Glaz 

Dr.  Prakash  Narayan 


©  Copyright  by 
Ta-Hsin  Li 
1992 


Dedication 


To  the  memory  of  my  grandmother 


Acknowledgements 


I  would  like  to  express  my  sincere  gratitude  and  appreciation  to  my  ad¬ 
visor,  Professor  Benjamin  Kedem,  for  his  encouragement  and  guidance. 
His  rigorous  scholarship  and  optimistic  attitude  toward  life  influenced  and 
stimulated  me  throughout  my  studies  at  the  University  of  Maryland. 

My  gratitude  also  extends  to  Professors  Piotr  Mikulski,  Eric  Slud, 
C.  Z.  Wei,  Ivo  Babuska,  and  Prakash  Narayan.  From  them  I  learned 
many  things  about  statistics,  time  series,  numerical  analysis,  and  signal 
processing,  which  not  only  have  contributed  to  this  dissertation,  but  also 
shall  benefit  the  rest  of  my  career.  I  would  also  like  to  thank  Professor 
Harland  Glaz  for  his  review  of  this  dissertation.  Special  thanks  are  due 
to  Professor  Sid  Yakowitz  of  the  University  of  Arizona  for  his  stimulating 
lectures  which  helped  the  development  of  this  research. 

I  am  very  grateful  to  my  wife,  Zeping,  my  parents,  and  my  family. 
Without  their  constant  love  and  unlimited  support,  this  dissertation  would 
have  been  impossible. 


m 


I  am  also  indebted  to  Professor  Xuelong  Zhu  and  other  friends  of  Ts- 
inghua  University.  Their  support  and  assistance  made  my  studying  in  the 
United  States  possible. 

Thanks  are  also  due  to  Silvia  Lopes,  John  Barnnet,  and  Felix  Santos 
for  their  helpful  comments  and  suggestions. 

This  research  was  supported  by  grants  AFOSR-89-0049,  ONR  N00014- 
89-J-1051,  and  NSF  CDR-88-03012. 


IV 


Contents 


List  of  Tables  viii 

List  of  Figures  ix 

1  Introduction  1 

1.1  Overview  .  1 

1.2  Problem  Formulation .  6 

1.3  Periodogram  Analysis  and  DFT .  7 

1.4  Nonlinear  Least  Squares .  13 

1.4.1  For  Well-Separated  Frequencies .  14 

1.4.2  For  Closely- Spaced  Frequencies .  16 

1.4.3  Computational  Considerations  .  17 

1.5  Summary .  19 

2  Autoregressive  Estimation  21 

2.1  AR  Approach  of  Frequency  Estimation .  21 

2.2  Estimation  of  AR  Coefficients .  23 

2.3  Inconsistency  of  Prony’s  Estimator .  25 

2.4  High-Order  AR  Method .  29 

2.5  Nonsingularity  of  Autocovariance  Matrix .  33 


v 


3  Limit  Theorems  of  Sample  Autocovariance  Function  35 

3.1  Asymptotic  Normality  of  SACF .  36 

3.2  CLT  for  SACF  from  Filtered  Process .  49 

3.3  Uniform  Strong  Consistency  of  SACF  .  52 

3.4  More  Results  on  Uniform  Strong  Consistency .  61 

4  Frequency  Estimation  by  Parametric  Filtering  64 

4.1  Parametric  Filtering  Method  .  65 

4.1.1  AR  Estimation  After  Parametric  Filtering .  65 

4.1.2  PF  Method  of  Frequency  Estimation . •.  .  .  67 

4.1.3  Least  Squares  Estimator .  70 

4.1.4  Relation  to  the  CM  Method .  71 

4.2  Statistical  Properties  of  the  PF  Estimator .  72 

4.2.1  Existence  and  Convergence .  72 

4.2.2  Strong  Consistency  and  Asymptotic  Normality .  77 

4.3  Extension  to  Complex  Sinusoids  .  82 

5  PF  Method  with  AR  (All-Pole)  Filter  84 

5.1  General  AR  Filter  and  PF  Estimator .  85 

5.1.1  Parametrization  .  86 

5.1.2  Parameter  Space .  88 

5.2  Statistical  Properties .  92 

5.3  Accuracy  of  the  PF  Estimator .  96 

5.4  Special  Cases  of  One  and  Two  Sinusoids .  100 

5.4.1  A  Single  Sinusoid  in  White  Noise .  100 

5.4.2  Two  Sinusoids  in  White  Noise .  106 

5.5  Experimental  Results .  Ill 

5.5.1  Univariate  PF  method .  Ill 


vi 


5.5.2  Multivariate  PF  method  for  Two  sinusoids  .  116 

5.6  Concluding  Remarks .  127 

Bibliography  132 


List  of  Tables 


1.1  Summary  of  Estimation  Methods .  19 

5.1  PF  fe  GLS  Estimates  for  Well- Separated  Frequencies .  118 

5.2  PF  &  GLS  Estimates  for  Closely- Spaced  Frequencies .  118 

5.3  Estimation  With  77  =  1 .  120 


List  of  Figures 

1.1  Plot  of  squared  gain  of  the  complex  exponential  filter .  11 

1.2  Periodogram  of  four  sinusoids  in  white  noise . 12 

1.3  DFT  of  four  sinusoids  in  white  noise .  13 

1.4  Contour  plot  of  J„  for  closely-spaced  sinusoids  .  18 

2.1  Plot  of  high-order  AR  log-spectrum .  31 

5.1  Parameter  space  A0{r})  in  the  case  of  two  sinusoids .  108 

5.2  Frequency  space  in  the  case  of  two  sinusoids .  109 

5.3  Squared  gain  function  and  poles  of  the  AR  filter  for  two  sinusoids  .  .  .  110 

5.4  Least  squares  mapping  in  the  case  of  a  single  sinusoid .  112 

5.5  Plot  of  mse  against  data  length  for  increasing  values  of  t] .  114 

5.6  Univariate  least  squares  mapping  in  the  case  of  two  sinusoids .  115 

5.7  Plot  of  mse  against  r]  for  closely-spaced  frequencies .  119 

5.8  Mean-squared  errors  and  averaged  frequency  estimates  against  SNR  for 

closely- spaced  frequencies  within  a  Fourier  bin  -  Case  I  .  122 

5.9  Mean-squared  errors  and  averaged  frequency  estimates  against  SNR  for 

closely-spaced  frequencies  within  a  Fourier  bin  -  Case  II .  124 

5.10  Two-dimensional  least  squares  mapping  -  Case  I .  125 

5.11  Two-dimensional  least  squares  mapping  -  Case  II .  126 

5.12  Contour  plot  of  2-d  least  squares  mapping .  128 


IX 


Chapter  1 


Introduction 


The  estimation  of  frequency  from  superimposed  noisy  sinusoids  is  an  important  subject 
in  time  series  analysis  and  signal  processing.  It  has  applications  in  various  disciplines 
of  science  and  technology,  such  as  geophysics,  seismology,  meteorology,  rotating  ma¬ 
chinery,  radar,  sonar,  and  communications.  The  survey  paper  by  Kay  and  Marple 
(1981)  serves  as  an  excellent  reference  of  modern  approaches  to  this  problem. 

1.1  Overview 

The  problem  of  frequency  estimation  has  attracted  a  great  deal  of  attention  in  the 
statistics  and  engineering  literature,  and  is  still  of  prime  concern  at  the  present  time. 
Traditional  approaches  to  this  problem  are  based  on  the  Fourier  transform.  A  typi¬ 
cal  example  is  periodogram  analysis  (Whittle,  1952).  This  method  is  able  to  provide 
satisfactory  results  in  many  cases.  In  particular,  when  the  frequencies  of  the  sinu¬ 
soids  are  well  separated,  periodogram  analysis,  which  locates  the  local  maxima  in  the 
periodogram  as  a  continuous  function  of  a  frequency  variable,  is  capable  of  produc¬ 
ing  consistent  frequency  estimates  with  asymptotic  standard  deviation  of  order  n~3/2 
(Walker,  1971),  where  n  is  the  data  length.  Despite  its  high  accuracy,  periodogram 
analysis  is  computationally  intensive,  since  it  requires  not  only  an  iterative  routine  to 
obtain  the  optimal  frequency  estimates,  but  also  a  certain  exhaustive  search  on  a  local 


1 


mesh  in  order  to  provide  initial  guesses  of  accuracy  o(n_1),  needed  for  the  iterative 
routine  to  converge  to  the  optimal  solution  (Rice  and  Rosenblatt,  1988).  Furthermore, 
in  order  to  achieve  consistency,  the  sinusoidal  frequencies  are  required  to  be  separated 
from  each  other  by  a  distance  greater  than  0(n-1)  (Walker  1971).  In  other  words, 
periodogram  analysis  is  unable  to  resolve  two  frequencies  closer  than  the  reciprocal  of 
the  data  length.  This  resolution  limit  happens  to  be  the  most  serious  problem  of  the 
Fourier  approach  (Kay  and  Marple,  1981). 

An  alternative  way  to  avoid  the  nonlinear  optimization  in  periodogram  analysis 
is  to  evaluate  the  periodogram  only  at  a  finite  number  of  locations  known  as  Fourier 
frequencies.  The  resulting  procedure,  which  we  refer  to  as  the  DFT  method  *,  is  equiv¬ 
alent  to  transforming  the  data  by  discrete  Fourier  transform  and  seeking  the  Fourier 
frequencies  that  correspond  to  the  largest  absolute  magnitudes  of  the  DFT.  The  DFT 
method  is  computationally  efficient,  as  compared  to  periodogram  analysis,  since  FFT 
(fast  Fourier  transform)  algorithms  can  be  employed  to  compute  the  transform.  How¬ 
ever,  this  reduction  in  computational  complexity  is  not  achieved  without  any  cost.  In 
fact,  the  estimation  accuracy  of  the  DFT  method  deteriorates  from  n~ 3/,z  to  n~1,  due 
to  the  focus  on  Fourier  frequencies.  The  resolution  limit  clearly  persists  in  DFT. 

The  so-called  nonlinear  least  squares  (NLS)  method  (or  MLE  in  the  case  of  white 
Gaussian  noise)  is  a  procedure  that  has  a  higher  resolution  than  periodogram  analysis. 
This  procedure  fits  a  sum  of  sinusoids  to  the  data  by  minimizing  the  sum  of  squared 
errors  with  respect  to  the  amplitudes,  phases,  as  well  as  frequencies  of  the  sinusoids. 
It  turns  out  that  the  NLS  method  not  only  has  a  higher  resolution  than  periodogram 
analysis,  but  also  achieves  similar  estimation  accuracy  (Hannan  and  Quinn,  1989). 
Unfortunately,  the  NLS  method  suffers  from  the  same,  maybe  more,  problems  of 
high  computational  complexity  as  periodogram  analysis.  Indeed,  the  NLS  method 
also  requires  a  certain  iterative  routine  that  must  be  started  with  initial  guesses  of 

1  It  is  sometimes  referred  to  as  periodogram  analysis  in  the  statistics  literature. 


2 


accuracy  o(ra-1),  in  order  to  obtain  the  optimal  solution  (Rice  and  Rosenblatt,  1988; 
Hannan  and  Quinn,  1989). 

The  trade-off  between  high  accuracy /resolution  and  low  computational  complex¬ 
ity  has  been  recognized  in  the  literature  of  frequency  estimation  for  many  years.  In 
applications,  as  long  as  the  computational  burden  is  not  a  major  problem,  satisfac¬ 
tory  frequency  estimates  with  high  accuracy  and  high  resolution  can  almost  always 
be  obtained  by  certain  nonlinear  procedures  such  as  the  NLS  method.  This  requires 
some  rough  initial  estimation  via,  for  example,  the  DFT  method,  and  an  exhaustive 
search  over  a  fine  grid  around  the  rough  estimates,  followed  by  an  iterative  algorithm 
starting  with  the  refined  estimates  (Abatzoglou,  1985;  Stoica,  Moses,  Friedlander,  and 
Soderstrom,  1989).  The  number  of  frequencies  q  can  also  be  determined  by  a  certain 
goodness-of-fit  test,  comparing  the  fitting  errors  for  different  values  of  q.  However, 
when  “real-time”  algorithms  are  required  for  the  frequency  estimation,  these  compu¬ 
tationally  burdensome  procedures  are  obviously  out  of  the  question.  In  these  cases, 
one  has  to  rely  on  simple  algorithms  that  can  be  implemented  easily.  Therefore,  the 
ultimate  objective  of  frequency  estimation  in  these  applications  is  to  seek  high  accuracy 
and  high  resolution  procedures  that  require  low  computational  burden. 

Most  of  these  so-called  modern  approaches  can  be  roughly  divided  into  two  broad 
categories.  In  the  first  category,  procedures  are  based  upon  the  fact  that  the  sinu¬ 
soidal  signal  satisfies  a  homogeneous  autoregressive  (AR)  equation  whose  coefficients 
are  uniquely  determined  by  the  frequencies  of  the  sinusoids.  Using  this  fact,  one  is 
able  to  transform  the  frequency  estimation  problem  into  an  AR  estimation  problem  to 
which  many  computationally  simple  linear  methods  can  be  applied.  Unlike  the  Fourier 
approach  in  which  the  data  are  implicitly  assumed  to  be  zero  outside  the  observation 
interval  —  an  unrealistic  assumption  responsible  for  the  resolution  limit  of  the  Fourier 
approach  (Kay  and  Marple,  1981)  —  the  modern  procedures  extrapolate  the  data  be¬ 
yond  the  observation  interval  by  fitting  parametric  models  to  the  measured  data,  and 


3 


thus  provide  higher  resolution  than  the  Fourier  approach  does.  A  widely-used  proce¬ 
dure  in  this  category  is  Prony’s  (spectral  line)  estimator  (Hildebrand,  1956;  Kay  and 
Marple,  1981)  that  fits  an  AR  model  to  the  noisy  data  by  the  least  squares  technique, 
yielding  a  system  of  linear  equations  for  the  AR  coefficients.  In  addition  to  providing 
higher  resolution,  many  procedures  in  this  category  can  also  be  implemented  recur¬ 
sively,  so  that  the  frequency  estimates  can  be  easily  updated  when  new  data  become 
available.  This  property  is  extremely  attractive  in  applications  where  instantaneous 
tracking  of  time- varying  frequencies  is  required,  rather  than  batch  processing. 

The  other  category  consists  of  those  procedures  that  are  based  upon  the  eigen¬ 
value  decomposition  of  certain  data  matrices,  of  which  the  nonzero  eigenvalues  or  the 
associated  eigenvectors  determine  the  sinusoidal  frequencies.  To  obtain  the  decompo¬ 
sition,  the  singular-value  decomposition  (SVD)  algorithms  are  usually  employed,  and 
in  some  procedures  the  noisy  data  matrices  are  also  cleaned  up  by  annihilating  small 
eigenvalues  in  the  decomposition.  Details  about  these  procedures  can  be  found,  for 
example,  in  a  book  by  Kay  (1988)  and  a  survey  paper  by  Li  (1991). 

In  this  dissertation,  we  shall  concentrate  on  the  procedures  in  the  first  category 
by  following  the  ideas  of  AR  modeling.  In  particular,  we  shall  propose  a  general  ap¬ 
proach  of  parametric  filtering,  which  we  refer  to  as  PF  method,  that  improves  Prony’s 
estimator  by  eliminating  its  bias  and  increasing  its  accuracy,  while  inheriting  its  high 
resolution  property  and  computational  simplicity.  In  the  statistics  and  engineering  lit¬ 
erature,  the  idea  of  parametric  filtering  has  been  applied  to  the  problem  of  frequency 
estimation  (see,  for  example,  Kay,  1984;  Kedem,  1986;  Dragosevic  and  Stankovic, 
1989;  He  and  Kedem,  1989;  Truong- Van,  1990;  Yakowitz,  1991;  Quinn  and  Fernandes, 
1991).  In  this  dissertation,  all  these  methods  are  unified  under  the  framework  of  AR 
estimation  and  extended  to  provide  consistent  and  efficient  frequency  estimates  which 
require  simple  computations. 

This  dissertation  is  organized  as  follows.  In  the  remaining  part  of  Chapter  1,  we 


4 


shall  formulate  more  precisely  the  frequency  estimation  problem  and  review  in  detail 
some  of  the  existing  procedures,  such  as  the  NLS  method  mentioned  earlier,  that  are 
closely  related  to  the  development  and  evaluation  of  the  PF  method.  In  Chapter  2, 
we  shall  summarize  the  AR  approach  and  discuss  its  asymptotic  properties.  As  we 
shall  see,  the  asymptotic  bias  inherent  in  Prony’s  estimator  motivates  the  develop¬ 
ment  of  the  PF  method.  Chapter  3  provides  some  new  limit  theorems,  including  (i) 
the  strong  uniform  consistency  of  sample  autocovariances  after  parametric  filtering, 
and  (ii)  the  asymptotic  normality  of  sample  autocovariances.  These  results  extend 
in  different  aspects  the  well-known  asymptotic  theory  of  sample  autocovariances  for 
continuous-spectrum  processes,  and  lay  the  theoretical  foundation  for  the  statistical 
analysis  of  the  PF  method  in  a  later  chapter.  In  Chapter  4,  we  shall  present  the 
PF  method  and  discuss  its  statistical  properties,  concerning  (i)  the  existence  of  the 
PF  estimator,  (ii)  the  convergence  of  an  iterative  algorithm  that  computes  the  PF 
estimator,  (iii)  the  strong  consistency,  and  (iv)  the  asymptotic  normality  of  the  PF 
estimator.  This  analysis  points  to  the  fact  that  the  PF  method  is  a  highly  effective 
procedure  for  frequency  estimation.  In  Chapter  5,  we  shall  specialize  the  PF  method 
by  considering  a  useful  parametric  filter,  which  we  refer  to  as  the  AR  filter  (also  known 
as  all-pole  filter).  Variations  of  this  filter  have  been  considered  before  in  the  engineer¬ 
ing  literature  (Matausek,  et  al.  1983;  Kay,  1984;  Dragosevic  and  Stankovic,  1989),  but 
no  statistical  analysis  has  been  done,  especially  for  multiple  sinusoids.  In  this  work, 
we  apply  the  general  principle  of  the  PF  method  in  connection  with  the  AR  filter  and 
investigate  statistical  properties  of  the  resulting  frequency  estimates.  We  shall  show 
that  significant  improvements  over  the  existing  methods  using  similar  filters  can  be 
achieved  by  the  PF  estimator  in  terms  of  the  sensitivity  to  initial  guesses,  estimation 
accuracy,  and  resolution,  especially  for  closely-spaced  frequencies.  Some  simulation 
results  are  provided  at  the  end  of  this  chapter  to  demonstrate  the  effectiveness  of  the 
PF  method. 


5 


1.2  Problem  Formulation 


The  problem  of  frequency  estimation  is  very  well  formulated  in  the  literature.  Suppose 
that  a  time  series  {yl5  of  length  n  is  observed  from  a  random  process  {yt} 

which  consists  of  q  superimposed  real  sinusoids  {aq}  contaminated  by  additive  noise 
{ej,  namely, 

9 

yt  =  xt  +  et  and  xt  =  ^  /3k  cos(ukt  +  <j>k)  (t  =  0,±1,±2, . . .).  (1.1) 

ife=i 

Assume  in  this  expression  that  the  number  of  sinusoids  q  >  0  is  a  known  integer  and 
that  the  amplitudes  f3k  and  the  frequencies  uk  are  unknown  constants,  satisfying 

f3k  >  0  and  0  <  aq  <  ■  ■  •  <  u>q  <  n. 

For  convenience,  we  assume  that  the  phases  <f>k  are  independent  and  identically  dis¬ 
tributed  (i.i.d.)  random  variables  with  uniform  distribution  on  the  interval  [0,27r). 
In  some  literature  (e.g.,  Walker  1971)  the  phases  are  also  assumed  to  be  constants 
instead  of  random  variables.  We  find  it  convenient  to  assume  that  the  phases  are  i.i.d. 
and  uniform  random  variables,  and  note  that  the  asymptotic  theory  is  not  altered  by 
this  assumption.  The  signal  {xt}  under  this  assumption  becomes  a  zero-mean  strictly 
stationary  process  (Grenander  and  Rosenblatt,  1957,  p.  30). 

The  noise  {c<}  is  assumed  to  be  independent  of  {</>*}  and  hence  of  the  signal  {xt}. 
Moreover,  in  this  dissertation,  {e<}  is  modeled  as  a  linear  process  of  the  form 

oo  oo 

{6}  ~  HD(0,  <t|),  £1^1  <00.  (1.2) 

j  —  —  CO  j  =  —  oo 

In  some  literature  (e.g.,  Hannan,  1973;  Quinn  and  Fernandes,  1991),  {£<}  is  assumed 
to  be  a  martingale  difference  sequence.  The  i.i.d.  assumption  is  made  here  only  for 
simplicity,  so  that  {at}  is  strictly  stationary  with  continuous  spectrum. 

It  is  easy  to  verify  that  under  the  above  assumptions  about  the  signal  and  the 
noise,  the  process  { yt }  becomes  strictly  stationary  with  mean  zero  and  autocovariance 


6 


function 


ryT  :=  E(yt+ryt)  =  rxT  +  r\  (r  =  0,  ±1,  ±2, . . .)  (1.3) 

where  r *  :=  E(xt+Txt)  and  reT  :=  E(et+Tet)  are  the  autocovariance  functions  of  the 
signal  and  the  noise,  respectively,  and  can  be  written  as 

g  OO 

rr  =  12h P2kcos(ukr)  and  reT  =  (1.4) 

k=l  ;=-oo 

These  expressions  will  be  used  frequently  in  this  dissertation. 

The  objective  of  frequency  estimation  is  to  find  estimators  of  the  sinusoidal  fre¬ 
quencies  uk  on  the  basis  of  the  data  set  {yi,. .  -  ,yn}- 

Notice  that  the  estimation  of  the  amplitudes  and  phases  (when  fixed)  of  the  si¬ 
nusoids  is  also  desirable  in  some  applications.  Clearly,  estimating  these  quantities  is 
relatively  easier  as  compared  to  the  frequency  estimation.  In  fact,  by  representing 
each  sinusoid  as  a  linear  combination  of  both  cosine  and  sine  waves  with  the  same 
frequency,  the  estimation  of  its  coefficients  becomes  a  linear  problem,  provided  that 
the  frequency  estimates  are  available  (see,  e.g.,  Bresler  and  Markovski,  1986).  For  this 
reason,  we  shall  only  concentrate  on  the  frequency  estimation  problem  in  this  work. 

As  remarked  earlier,  the  number  of  frequency  q  is  assumed  to  be  known  a  priori. 
When  it  is  unknown,  there  are  several  methods  available  in  the  literature  that  can  be 
used  to  estimate  this  number.  Some  of  these  methods  are  based  on  AlC-like  criteria 
and  others  on  the  eigenvalue  decomposition  of  covariance  matrix  of  the  data.  Details 
concerning  this  matter  can  be  found,  for  example,  in  Kay  (1988)  and  Fuchs  (1988). 

1.3  Periodogram  Analysis  and  DFT 

As  briefly  mentioned  in  Section  1.1,  one  of  the  traditional  Fourier-transform-based 
procedures  is  periodogram  analysis ,  proposed  by  Whittle  (1952)  as  an  approximation 
to  the  nonlinear  least  squares  method  (Walker,  1971). 


7 


(1.5) 


The  periodogram  of  the  time  series  {yi,. .  .,yn}  is  defined  by 

2 

exp(-ituj) 

t=i 

as  a  continuous  function  of  the  frequency  variable  u>  6  [0,  tt].  Periodogram  analysis  is  a 
method  of  frequency  estimation  that  seeks  q  extremum  points  0  <  <  •  ■  ■  <  uq  <  7r 

that  correspond  to  the  largest  values  in  the  periodogram  Pn(u).  This  can  be  done  by 
maximizing  the  sum 


Sn:=f2Pn(“*)  (1-6) 

jt=i 

with  respect  to  {u>k}.  Clearly,  if  no  other  restrictions  are  imposed  on  this  problem,  the 
q  maxima  would  cluster  at  the  global  maximum  of  Pn(u;),  yielding  incorrect  frequency 
estimates.  To  remedy  this  problem,  Walker  (1971)  introduced  the  following  separation 
condition  2 


I  uk  -  uk<  >  cn/n  V  1  <  k'  <  k  <  q, 


(1.7) 


(  c«/n  <  Wfc  <  7r  -  c„/n  V  k ,  where  c„  ->  oo,  c„/n  — »•  0. 

This  condition  simply  says  that  the  sinusoidal  frequencies  must  be  separated  from 
each  other  and  from  {0, 7r)  by  a  distance  greater  than  0(n~l).  In  particular,  when 
two  frequencies  are  closer  than  n-1,  the  separation  condition  is  clearly  violated,  and 
hence  the  estimation  accuracy  by  periodogram  analysis  is  no  longer  guaranteed.  This 
is  known  as  the  resolution  limit  of  periodogram  analysis. 

Large  sample  properties  of  periodogram  analysis  have  been  investigated  by  Walker 
(1971,  1973)  and  Hannan  (1973).  The  following  theorem  summarizes  their  results 
regarding  the  frequency  estimates. 


Theorem  1.1  Let  {yi, . . .  ,yn}  be  a  time  series  observed  from  (1.1)  and  Pn{u>)  be  its 
periodogram  given  by  (1.5).  Suppose  that  the  frequency  estimates  <2>k  maximize  Sn  in 

2In  Walker  (1971),  the  separation  condition  was  given  in  a  slightly  different  form,  and  the  frequen¬ 
cies  were  not  explicitly  required  to  stay  away  from  0  and  tt. 


8 


(1.6)  under  the  separation  condition  (1.7).  Then,  the  d>k  are  consistent  in  the  sense 
that  uk  —  uk  a—>  0,  (k  —  1  as  n  -+  oo.  They  are  also  asymptotically  jointly 

independent  and  normally  distributed  such  that  n3/2(u;k  —  uk)  ^  N(0,  12/7*.),  where 

7 k  ■=  7^2  V’i  exp(*i^)| 

is  the  signal-to-noise  ratio  (SNR)  of  the  kth  sinusoid. 

In  this  theorem  the  asymptotic  standard  deviation  of  the  frequency  estimates  is 
shown  to  be  of  order  ra-3/2,  as  compared  to  n-1^2  for  many  standard  estimators.  This 
high  accuracy,  however,  is  not  surprising.  Notice  that  in  the  frequency  domain  the 
power  of  the  sinusoidal  signal  concentrates  only  at  a  finite  number  of  locations  and 
all  other  power  components  are  due  to  the  noise.  It  is  therefore  possible  to  clean  up 
the  noise  considerably  by  using  a  certain  bandpass  filter  that  passes  the  sinusoids  and 
suppresses  the  noise. 

This,  in  fact,  explains  the  high  accuracy  of  periodogram  analysis.  Indeed,  for 
any  fixed  oj0,  let  us  consider  a  linear  filter  —  the  complex  exponential  filter  —  whose 
impulse  response  {hj}  is  given  by  hj  :=  exp (i(j  +  l)u>0)  for  j  =  0, 1, . . .,  n  —  1,  and 
hj  :=  0  elsewhere.  Applying  this  filter  to  the  data  yields  the  output  sequence 

t  t 

Vt(,u0)  :=  YjVjht-i  =  ]C&exP(*(*-i  +  !Vo)  (i  =  l,...,n).  (1.8) 

j=i  j= i 

Note  that  yt  is  assumed  to  be  zero  for  t  <  0  in  the  filtering.  From  (1.5)  and  (1.8),  it 
is  easy  to  verify  that 

Pn(w0)  =  n-l\yn{u0)\2 .  (1.9) 

That  is,  the  periodogram  at  u0  is  proportional  to  the  squared  magnitude  of  the  output 
yt(u> 0)  when  t  =  n. 

Notice  that  the  squared  gain  of  the  complex  exponential  filter  {hj)  can  be  written 


9 


as 

sin2(n(cj  —  u>n)/2)  r  , 

.  j1./ - V Ls  for  u  ^  ujQ 

Ga(u)  =  sm  ((“>  -  wo)/2)  (1.10) 

n2  for  u  =  Uq. 

Given  any  6  >  0,  it  is  easy  to  see  that  G„(lj)  =  o(n6)  for  all  u  /  (but  of  course 
this  “o”  is  not  uniform  in  w  ^  u>0).  Figure  1.1  shows  the  normalized  squared  gain 
n~2G„(u ;)  with  u>Q  =  0.5tt  as  a  function  of  the  normalized  frequency  /  :=  u/k.  Clearly, 
for  sufficiently  large  n,  {hj}  is  a  bandpass  filter  with  a  narrow  pass  band  around 
cj0.  Therefore,  if  the  filter  locks  on  one  of  the  sinusoids,  the  locked  sinusoid  will  be 
significantly  enhanced  and  the  periodogram  is  expected  to  take  a  large  value  of  order 
O(n),  as  can  be  seen  from  (1.9)  and  (1.10).  The  output  sequence  {yi(u>0),  •  •  •  >  2/n(wo)} 
in  this  case  consists  basically  of  the  locked  sinusoid  plus  some  filtered  noise  which 
has  a  considerably  smaller  variance  than  in  the  original  data.  Being  the  sum  of  the 
variances  of  the  locked  sinusoid  and  the  filtered  noise,  the  variance  of  yn(u 0),  which  is 
proportional  to  the  expected  value  of  the  periodogram,  is  therefore  significantly  larger 
than  in  the  case  where  no  sinusoids  are  captured  by  the  filter,  typically  O(n)  versus 
0(1)  (see,  e.g.,  Priestley,  1981).  For  this  reason,  the  locations  where  P„(w)  assumes 
the  largest  values  produce  very  accurate  frequency  estimates. 

Furthermore,  the  behavior  of  the  squared  gain  G„(u)  also  explains  the  resolu¬ 
tion  limit  of  periodogram  analysis.  Notice  that  the  effective  bandwidth  of  Gn(u)  is 
roughly  of  order  0(n_1)  (see  Figure  1.1).  It  is  therefore  very  difficult  for  the  complex 
exponential  filter  to  distinguish  two  frequencies  that  are  closer  than  the  bandwidth, 
since  in  this  case  the  filter  tends  to  either  enhance  or  suppress  both  frequencies  simul¬ 
taneously.  The  separation  condition  (1.7)  can  be  interpreted  as  a  requirement  that 
prevents  this  situation  from  happening.  Figure  1.2  illustrates  the  resolution  limit  of 
the  periodogram.  In  this  example,  the  periodogram  P„(cu),  plotted  again  as  a  funtion 
of  the  normalized  frequency  /,  was  computed  for  a  time  series  of  length  n  —  100  which 
consists  of  four  sinusoids  in  additive  white  Gaussian  noise.  Two  frequencies  of  the 


10 


1 


Figure  1.1:  Plot  of  the  normalized  squared  gain  n  2Gn(aj)  of  the  complex  exponential 
filter  with  u0  =  0.57T  for  n  —  10  (dashed  curve)  and  n  =  100  (solid  curve). 

sinusoids  (uq  =  0.317T  and  cj2  =  0.35tt)  are  well-separated,  while  others  (u>3  =  0.7137T 
and  =  0.7257 r)  are  closely-spaced  with  a  distance  0.0127T  («  7r/n).  The  phase  of 
each  sinusoid  is  zero  and  the  SNR  is  3  dB.  Clearly,  the  closely-spaced  frequencies  are 
not  resolved  by  the  periodogram. 

From  the  computational  point  of  view,  periodogram  analysis  is  a  highly  nonlinear 
procedure.  Certain  iterative  routines,  such  as  the  Newton-Raphson  algorithm,  are 
therefore  necessary  in  order  to  obtain  the  optimal  estimates  that  maximize  P„(w).  As 
pointed  out  by  Rice  and  Rosenblatt  (1988),  finding  the  global  maxima  in  the  peri¬ 
odogram  is  not  an  easy  job.  Since  there  are  many  local  maxima  with  a  separation 
in  frequency  about  0(n~1),  as  can  be  seen  in  Figure  1.2,  one  must  start  the  iterative 
routines  with  initial  guesses  that  are  very  close  to  the  true  frequencies,  in  order  to  con¬ 
verge  to  the  desired  solution.  Initial  guesses  of  accuracy  o(n-1)  are  typically  required 
for  the  convergence.  It  is  usually  suggested  that  the  DFT  method  be  used  first  to  yield 
some  rough  frequency  estimates,  and  then  followed  by  an  exhaustive  search  on  a  fine 
grid  around  the  rough  estimates  to  produce  the  desired  initial  guesses  (Abatzogolou, 


11 


35 

30 

25 

20 

■ 

15 

■ 

10 

■ 

5 

■ 

n 

0  0.2  0.4  0.6  0.8  1 


Figure  1.2:  Periodogram  of  four  sinusoids  in  white  noise  with  n  =  100.  The  SNR  is 
3  dB  per  sinusoid,  and  the  true  frequencies  are  indicated  by  dashed  lines. 

1985;  Rice  and  Rosenblatt,  1988;  Stoica,  Moses,  Friedlander,  and  Soderstrom,  1989). 
Using  a  large  number  of  random  initial  values  was  also  suggested  in  the  literature 
(Rice  and  Rosenblatt,  1988). 

Of  course,  the  computational  complexity  can  be  reduced  considerably  by  evaluating 
the  periodogram  only  at  Fourier  frequencies  of  the  form  2irj /n  for  j  =  0,1 1. 
In  fact,  Pn(2irj/n)  is  proportional  to  the  magnitude  of  discrete  Fourier  transform 
(DFT)  of  the  data  and  can  be  efficiently  computed  with  the  help  of  FFT  (fast  Fourier 
transform)  algorithms.  In  so  doing,  periodogram  analysis  reduces  to  the  DFT  method 
that  seeks  the  Fourier  frequencies  which  correspond  to  the  largest  values  of  Pn(2irj/n). 

Despite  its  computational  simplicity,  the  DFT  method  does  not  have  the  same 
estimation  accuracy  as  periodogram  analysis.  In  fact,  its  accuracy  is  of  order  n_1, 
instead  of  n~ 3/2,  since  only  Fourier  frequencies,  which  are  discrete  and  separated  by 
2ir /n,  are  considered.  Moreover,  the  resolution  limit  of  the  DFT  method  is  similar  to 
that  of  periodogram  analysis.  Figure  1.3  presents  the  DFT  magnitude  for  the  same 
time  series  as  used  in  Figure  1.2. 


12 


Figure  1.3:  Plot  of  DFT  magnitude  for  the  same  data  as  in  Figure  1.2. 

1.4  Nonlinear  Least  Squares 

Closely  related  to  periodogram  analysis  is  the  nonlinear  least  squares  (NLS)  method 
that  has  been  considered  from  different  aspects  by  many  researchers  (see,  for  example, 
Walker,  1971;  Hannan,  1973;  Rife  and  Boorstyn,  1974,  1976;  Rice  and  Rosenblatt, 
1988;  Stoica  and  Nehorai,  1989). 

The  NLS  method  is  conceptually  very  simple  —  it  fits  a  sum  of  q  sinusoids  to 
the  data  {yi,...,yn}  by  minimizing  the  sum  of  squared  errors  with  respect  to  the 
amplitudes,  phases,  as  well  as  frequencies  of  the  sinusoids.  More  precisely,  the  NLS 
method  minimizes  the  criterion 

n  q  ^ 

J n  -=^2  Vt  ~  ^  Pk  COs(u>kt  +  <f>k)  (1-11) 

*=1  k= 1 

with  respect  to  {pk,<fik,Uk}-  Clearly,  it  is  a  highly  nonlinear  optimization  problem. 

It  is  interesting  to  note  that  once  the  NLS  frequency  estimates  are  available,  one 
can  easily  obtain  the  NLS  estimates  of  the  amplitudes  and  phases.  In  fact,  if  we 
define  Ak  :=  ftk  cos  (f)k  and  Bk  :=  —fh  sin  then  cos (u>kt  +  <pk)  can  be  written  as 
Ak  cos (ojkt)  +  Bk  sin(wfct).  By  this  reparametrization,  one  can  minimize  Jn  with  respect 


13 


to  {Ak,  Bk,uk}  instead  of  {/3k,  (j>k,u)k}.  The  resulting  estimates  of  Ak  and  Bk  can  be 
transformed  to  yield  the  NLS  estimates  of  (ik  and  <j> k,  by  using  ftk  =  y/ A\  +  Bk  and 
<t> k  =  —  arctan(J9jfe/Afc).  The  NLS  estimates  of  Ak  and  Bk  can  be  easily  obtained  by 
linear  methods.  In  fact,  it  is  readily  shown  that  for  any  fixed  uk  the  minimizer  of 
Jn  with  respect  to  [v4t,  -  •  -  ,Aq,Bi,---  ,Bq]T  is  given  by  (GTG)-1GTy,  where  y  := 
[3/1 ,  •  •  ■  ,  yn]T  is  the  data  vector  and 

cos  uq  •  •  •  cos  u>q  sin  uq  •  •  ■  sin  u>q 
G  :=  |  :  I  :  (1.12) 

cos  nui  •  •  •  cos  nu>q  sin  nuq  •  •  •  sin  nuq 

Substituting  these  optimal  Ak  and  Bk  in  Jn  yields 

J'n  :=  ||y  -  G(GTG)-1GTyj|2  =  yT{I  -  G  (GTG)-1GT}y.  (1.13) 

Therefore,  to  obtain  the  NLS  frequency  estimates,  it  remains  to  minimize  J'n  with 
respect  to  the  frequencies  uk.  A  similar  argument  for  complex  sinusoids  can  be  found 
in  Bresler  and  Macovski  (1986).  It  is  worth  pointing  out  that  the  NLS  estimator  of 
[Ai,  •  •  •  ,  Aq,  Bi,  ■  ■  •  ,  Bq]T  obtained  by  substituting  the  NLS  frequency  estimates  in  G 
is  consistent  and  asymptotically  normal  with  the  usual  normalizing  factor  n 1/<2  (see, 
e.g.,  Stoica  and  Nehorai,  1989). 

1.4.1  For  Well-Separated  Frequencies 

When  the  separation  condition  (1.7)  holds,  the  NLS  frequency  estimates,  denoted  by 
u>k ,  can  be  shown  to  possess  the  same  asymptotic  properties  as  those  from  periodogram 
analysis  (Walker,  1971).  In  other  words,  Theorem  1.1  remains  true  (with  the  same 
variances)  for  the  uk  that  minimize  Jn  in  (1.11). 

In  fact,  periodogram  analysis  was  proposed  by  Whittle  (1952)  as  an  approximation 
to  NLS  (Walker,  1971).  To  see  this,  let  us  expand  Jn  in  (1.11)  and  write  Jn  =  Un  +  Rn, 


14 


where 


un  :=  J2vt  +  -zit,Y,yti3*cos(ukt  +  fa) 

t= 1  Z  k- 1  i=l  fc=l 

n  q  n  q 

Rn  :=  s  S  /3k(3k>  cos(ukt  +  <j>k)  cos(uk’t  +  fa,)  -  -  ]T/?£. 

t=l  k,  Jfc#=l  Z  i=l 

For  any  fixed  uk,  suppose  that  Un  is  minimized  by  {f3k,(f>k}.  Then,  it  is  easy  to 
verify  by  differentiating  Un  with  respect  to  {f3k,<f>k}  that  the  following  equations  must 
be  satisfied,  i.e., 

n  -  "  n  - 

yt  sin(wjtt  +  <j>k)  -  0  and  ^  Vt  cos(u >kt  +  <j>k)  =  — /3* . 

t= i  t=i 

Using  these  results,  the  value  of  Un  corresponding  to  {/3k,<j>k}  can  be  written  as 

K  = 

t=l 

where  5„  is  the  sum  defined  by  (1.6).  Clearly,  the  frequency  estimates  produced  by 
minimizing  Un  with  respect  to  {fik,<f>k,uk}  can  be  obtained  directly  by  minimizing  U'n , 
or,  equivalently,  by  maximizing  Sn.  This  is  to  say  that  minimizing  Un  yields  the  same 
frequency  estimates  as  periodogram  analysis. 

It  is  now  sufficient  to  show  that  Rn  is  asymptotically  negligible,  so  that  minimiz¬ 
ing  Jn  will  be  asymptotically  equivalent  to  minimizing  Un,  and  hence  the  frequency 
estimates  by  NLS  will  have  the  same  asymptotic  properties  as  those  by  periodogram 
analysis.  This  can  be  done  under  the  separation  condition  (1.7).  To  do  so,  we  first 
need  the  following  results. 


Lemma  1.1  Let  {An}  be  a  sequence  defined  in  the  interval  (0,tt)  such  that  cn/n  < 
Xn  <  7T  —  c„/n,  where  cn  — *■  oo  and  cnjn  — »■  0.  Then,  we  have  sin(nA„)/  sin  A„  =  o(n). 
Moreover,  the  separation  condition  (1.7)  implies  that 

si,W^+^)/2)=o(n)  mi  sm(u(^  -  =  0(„ 

sin((ujjt  T  ojki)/2)  sm((wn  —  uv)/2) 

where  the  first  equality  holds  for  all  k  and  k',  and  the  second  for  all  k  k' . 


15 


Proof.  Let  Qn  :=  |  sin(nA„)/(rasin  A„)|,  then  it  suffices  to  show  that  Qn  —  o(l). 
Notice  that  Qn  <  (nsinA„)-1  and  hence  limsupQ„  <  limsup  (nsin  A„)-1.  On  the 
other  hand,  since  cn/n  <  An  <  n  —  cn/n  and  cn/n  — »■  0,  we  obtain  sin  An  >  sin(c„/n)  > 
0  for  large  n  and  sin (c„/ra)/(c„/n)  — ►  1.  Therefore,  it  follows  that 

limsup  (nsin  An)-1  <  limsup  {n  sin(c„/n)}_1  =  lime"1  •=  0. 

Combining  these  results  gives  lim  sup  Qn  =  0,  i.e.,  Qn  =  o(l).  The  remaining  proof 
can  be  established  by  considering  \n  =  uk  —  uki  and  An  =  uk  +  uv,  respectively. 

Now,  using  the  trigonometric  identity 


cos  A  cos  A'  =  |{cos(A  —  A')  +  cos(A  +  A')} 


(1.14) 


we  can  write 


2 Rn  =  £  0*0*'  cos((u*  ~  v*')1  +  (0*  ~  $*')) 


k,k'  =  1 
k^k' 


t= 1 


+  £  0*0*'  £  cos((Uk  +  Vki)t  +  ( (j>k  +  (j>k'))- 

k,k'=l 


t=l 


It  is  easy  to  verify,  upon  noting  that  coscu  =  {exp(icu)  +  exp(— iw)} /2,  that 


£cos(K  ±  Uk,)t  +  (fa  ±  fa,)) 


<  =  1 


< 


sin2(n(u;*  ±  uk,)/2) 
sin2((w*  ±  uki)/2)  ' 


Therefore,  by  Lemma  1.1,  we  obtain  Rn  =  o(n).  On  the  other  hand,  it  can  be  shown 
that  Un  =  0(n).  This  implies  that  Rn  is  asymptotically  negligible  as  compared  with 
Un,  and  hence  indicates  that  NLS  can  be  replaced  by  periodogram  analysis  without 
loss  of  asymptotic  accuracy.  A  derivation  of  the  asymptotic  covariance  matrix  of  the 
NLS  estimator  can  be  found  in  Stoica  and  Nehorai  (1989). 


1.4.2  For  Closely-Spaced  Frequencies 

As  seen  above,  the  negligibility  of  Rn  in  Jn  is  based  entirely  upon  the  separation 
condition  (1.7)  that  guarantees  Lemma  1.1.  When  this  condition  fails,  Rn  is  no  longer 


16 


negligible.  As  an  example,  let  us  consider  in  this  subsection  the  case  of  q  =  2,  where 
Ui  is  a  constant  while  u2  =  uq  +  S/n  for  some  constant  6  >  0  (Hannan  and  Quinn, 
1989).  In  this  case  u2  —  uq  =  0(n_1)  and  hence  the  separation  condition  (1.7)  is  not 
satisfied.  Moreover,  it  can  be  shown  that  Rn  has  the  same  order  of  magnitude  as  Un, 
i.e.,  Rn  =  0(n),  instead  of  o(n),  since 

n  •  c 

lim  n_1  V  cos(uqf)  cos(u}2t)  =  . 

n~*oo  2  0 

Consequently,  periodogram  analysis  is  no  longer  a  good  approximation  to  NLS.  In 
fact,  for  closely-spaced  frequencies,  as  in  our  example,  the  NLS  method  provides  better 
frequency  estimates  than  periodogram  analysis  does.  In  a  recent  paper  by  Hannan  and 
Quinn  (1989),  it  was  shown  that,  in  our  example  of  two  closely-spaced  frequencies, 
Theorem  1.1  still  holds  for  the  NLS  estimates,  obtained  by  minimizing  Jn  subject  to  a 
relaxed  separation  condition,  except  that  asymptotic  variances  are  different  from  (in 
fact  larger  than)  those  for  well-separated  frequencies.  The  relaxed  separation  condition 
was  obtained  by  replacing  the  quantity  cn  in  (1.7)  with  a  less  restrictive  one  of  the 
form 


:=  Kn\J\ogn/n,  where  nn  — >  oo  and  n„ y^log n/n 


0. 


Notice  that  in  this  case  c„  tends  to  zero,  instead  of  infinity,  so  that  frequencies  are 
allowed  to  stay  within  a  distance  closer  than  or  equal  to  0{n~l).  In  this  sense,  the 
NLS  method  provides  a  higher  resolution  than  periodogram  analysis  for  frequency 
estimation. 


1.4.3  Computational  Considerations 

The  nonlinear  least  squares  method,  as  its  name  implies,  is  a  highly  nonlinear  optimiza¬ 
tion  problem.  It  requires  not  only  iterative  routines  to  obtain  the  optimal  frequency 
estimates,  but  also  very  accurate  initial  guesses  for  the  iterative  routines  to  converge 
to  the  optimal  solution. 


17 


Figure  1.4:  Contour  plot  of  Jn  for  closely- spaced  sinusoids,  centered  at  true  frequency. 

As  pointed  out  by  Rice  and  Rosenblatt  (1988),  the  criterion  Jn  has  many  local 
minima  separated  in  frequency  by  0(n-1),  similar  to  the  situation  in  periodogram 
analysis.  Therefore,  in  order  for  the  iterative  routines,  such  as  the  Newton-Raphson 
algorithm,  to  converge  to  the  global  minimum  point  of  initial  frequency  estimates 
of  accuracy  o(ra-1)  are  typically  required. 

For  well-separated  frequencies,  periodogram  analysis  is  the  preferred  method  for 
frequency  estimation  since  it  reduces  the  multi-dimensional  optimization  problem  of 
NLS  to  several  independent  one-dimensional  optimization  problems.  However,  as  dis¬ 
cussed  in  the  previous  section,  maximizing  the  periodogram  P„(u> )  instead  of  minimiz¬ 
ing  Jn  is  still  an  uneasy  task.  In  fact,  an  iterative  routine  that  starts  with  a  certain 
exhaustive  search  is  unavoidable  in  order  to  obtain  the  optimal  solution.  For  closely- 
spaced  frequencies,  one  has  to  return  to  the  use  of  J„  since  the  periodogram  does  not 
resolve  closely-spaced  frequencies.  This,  again,  requires  an  iterative  routine  plus  an 
exhaustive  search  procedure  (Hannan  and  Quinn,  1989). 

Figure  1.4  illustrates  the  situation  by  an  example  of  two  sinusoids  in  additive  white 
Gaussian  noise  with  closely-spaced  frequencies.  The  data  length  is  n  =  100,  and  the 


18 


Table  1.1:  Summary  of  Estimation  Methods 


Accuracy 

Resolution 

Complexity 

Initial  Guesses 

PA 

0(n~3!2) 

0(n~l) 

nonlinear 

o(n-1) 

NLS 

0{n ~3/2) 

O^n-1  -y/logn/n) 

nonlinear 

o(n-1) 

DFT 

0{n~l) 

0(n~l) 

O(nlogn) 

* 

SNR  is  3  dB  per  sinusoid.  The  contour  plot  of  -  log  Jn  is  shown  in  Figure  1.4  as  a  func¬ 
tion  of  frequency  variable  (uq,^)  over  the  region  [0.6137T,  0.8137t]  x  [0.6257T,  0.8257r]. 
In  calculating  Jn,  the  exact  values,  instead  of  estimates,  are  used  for  the  amplitudes 
and  phase.  The  true  frequencies  are  uq  =  0.713tt  and  u;2  =  0.7257T,  shown  as  the 
center  of  the  plot.  In  this  plot,  a  peak  of  width  of  0(n~l )  is  clearly  seen  near  the 
center,  indicating  that  very  accurate  estimates  can  be  obtain  by  minimizing  Jn.  On 
the  other  hand,  many  local  maxima  with  a  separation  of  0(n~l )  are  also  seen  in  the 
plot.  Therefore,  initial  guesses  of  accuracy  o(n-1)  must  be  used  in  order  to  guarantee 
that  gradient-based  algorithms  will  converge  to  the  optimal  solution. 

1.5  Summary 

As  seen  in  the  previous  sections,  nonlinear  methods,  such  as  periodogram  analysis  (PA) 
and  NLS,  are  able  to  provide  very  accurate  frequency  estimates  but  at  the  expense 
of  high  computational  complexity.  On  the  other  hand,  the  DFT  method  is  computa¬ 
tionally  simple  but  its  accuracy  and  resolution  cannot  match  the  NLS  method.  These 
facts  are  briefly  summarized  in  Table  1.1. 

Since  there  are  applications  (e.g.,  radar  and  sonar)  where  frequency  estimators 
with  high  accuracy  and  high  resolution  are  demanded  while  the  computation  resources 
are  limited,  many  researchers  have  been  motivated  to  seek  alternative  approaches 


19 


suitable  for  these  situations.  To  improve  the  resolution  of  periodogram  analysis  and 
the  DFT  method,  which  is  crucial  to  many  applications,  it  has  been  understood  that 
the  main  reason  for  the  resolution  limit  of  the  periodogram  is  that  the  data  outside 
the  observation  interval  are  implicitly  assumed  to  be  zero  in  the  computation  of  the 
periodogram.  This  is  well  reflected  in  the  definition  (1.8)  of  the  output  sequence 
(yt(«b)}  whose  last  value  with  t  =  n  defines  the  periodogram.  Due  to  this  practical 
assumption,  the  energy  of  a  sinusoid  —  which  theoretically  concentrates  at  a  single 
frequency  —  will  spread  all  over  the  nearby  frequency  components  in  the  periodogram, 
making  it  difficult  for  the  periodogram  to  resolve  closely-spaced  frequencies.  This 
phenomenon  is  known  as  the  spectrum  leakage  (Key  and  Marple,  1981). 

In  the  next  chapter,  we  shall  review  and  analyze  an  alternative  approach,  called 
AR  method,  that  extrapolates  the  data  beyond  the  observation  interval,  and  thus 
hopefully  provides  a  higher  resolution,  by  fitting  an  autoregressive  (AR)  model  to  the 
data. 


20 


Chapter  2 


Autoregressive  Estimation 


In  this  chapter,  we  shall  discuss  the  autoregressive  (AR)  approach  for  frequency  esti¬ 
mation.  This  method  is  widely  used  in  spectral  analysis  because  of  its  computational 
simplicity  and  high  resolution  property.  As  we  shall  see,  however,  the  AR  approach 
leads  to  biased  and  hence  inconsistent  frequency  estimates. 


2.1  AR  Approach  of  Frequency  Estimation 

The  AR  approach  of  frequency  estimation  is  based  upon  the  following  observations. 
Suppose  that  {xt}  is  the  sinusoidal  signal  in  (1.1)  which  consists  of  q  superimposed 
sinusoids  with  frequencies  uq, . .  .  ,wg.  Let  us  denote  by  z~1  the  backward-shift  operator 
so  that  z~xxt  —  xt- 1,  and  consider  the  polynomial  A(z~l)  —  the  AR  polynomial  — 
defined  by 

A(z~x)  :=  na -  zkz~l){1  -  zkz~l)  =  ^aiz~3  (2-1) 

k= 1  j= 0 

where  zk  :=  exp(iojfc),  and  zk  :=  exp(— iu>k)  is  the  complex  conjugate  of  zk.  The 
following  results  can  be  obtained  immediately  from  the  definition  of  A(z~l). 

Lemma  2.1  Let  A(z-1)  be  the  polynomial  in  (2.1).  Then,  (a)  the  2 q  zeros  of  A(z~x) 
are  z  =  exp(±fa;jfe))  (k  =  l,...,g);  (b)  the  2g  +  l  coefficients  aj  of  A(z~1)  are  real  and 


21 


symmetric  in  the  sense  that 


a0  =  1  and  a2q-j  -  %  (j  -  0, l,...,g-  1);  (2.2) 

and  (c)  the  aj  uniquely  determine,  and  are  determined  by,  the  frequencies  u>k. 

Proof.  Part  (a)  and  Part  (c)  are  trivial.  In  Part  (b),  the  aj  are  real  because  the 
zeros  of  A{z~r)  are  complex  conjugate  pairs,  and  the  symmetry  of  aj  follows  from  the 
identity  z2qA(z~x )  =  A(z),  since  both  zk  and  zfl  are  zeros  of  A(z~l).  <C> 

Since  exp(±icjfc)  are  zeros  of  A(z~1),  it  is  easy  to  verify  that 

A(z_1)  exp(±iu>kt )  =  exp(±?u;)fct)  A(exp(^iu;jt))  =  0 

for  all  t.  Notice  that  xt  can  be  written  as  a  linear  combination  of  exp (±iukt).  There¬ 
fore,  it  follows  that  A(z~1)  xt  =  0,  namely, 

2 1 

=  °  (t  —  0,  ±1,  ±2, . . .).  (2.3) 

i= o 

This  is  to  say  that  the  sinusoidal  signal  {xt}  satisfies  a  homogeneous  autoregressive 
equation  of  order  2 q,  with  the  AR  coefficients  being  identical  to  the  coefficients  g;-  as 
defined  by  (2.1). 

According  to  these  results,  the  original  problem  of  frequency  estimation  can  be 
equivalently  stated  as  that  of  estimating  the  AR  parameter 

a  :=  [«!,•••  ,aq]T. 

This  reparametrization  enables  us  to  employ  many  well-studied  linear  methods  that 
usually  end  up  with  solving  systems  of  linear  equations.  Once  an  estimate  of  a  becomes 
available,  the  frequency  estimates  can  be  obtained  from  the  zeros  of  A{z~x)  in  (2.1), 
with  the  aj  replaced  by  their  estimates.  We  refer  to  this  method  as  the  AR  approach 
of  frequency  estimation. 


22 


2.2  Estimation  of  AR  Coefficients 


A  widely-used  procedure  for  estimating  the  AR  parameter  a  is  Prony’s  estimator 
(Hildebrand,  1956;  Kay  and  Marple,  1981),  also  known  as  the  least  squares  (LS)  es¬ 
timator,  which  can  be  summarized  as  follows.  From  (2.3),  we  obtain  A{z~l)yt  = 

A(z~1)xt  +  A(z~1)et  =  A(z~1)et.  This  implies  that  { yt }  satisfies  the  equation 
2  q 

Y^ajyt-j  =  et  (t  =  0,±1,±2,...)  (2.4) 

j=o 

where  et  :=  A(z -1)  et  depends  on  the  noise.  For  the  time  series  {jq, . . . ,  yn }  of  length 
n  >  2 q,  the  AR  equation  (2.4),  together  with  (2.2),  yields  the  following  multivariate 
linear  regression  model 


y  =  -YQa  +  e. 


(2.5) 


In  this  expression,  y  and  Y  are  data  matrices  as  defined  by 


V2q+1  +  V\ 

V-zq 

2/2 

y  := 

l 

and  Y  := 

• 

• 

Un  “1”  Vn  —  lq 

_  Vn-l  ■  ' 

Vn  —  2q+l 

respectively,  and  e  the  error  term  given  by 

e  :=  [e2q+i,---  ,en]T.  (2.7) 


The  (2 q  —  l)-by-<7  matrix  Q  takes  care  of  the  symmetry  of  the  AR  coefficients  and 
admits  the  following  form 


I 


Q  := 


0T  1 


0 


where  I  stands  for  the  ( q  —  l)-by-(qr  — 1)  identity  matrix,  I  the  (q—  l)-by-(g  —  1)  reverse 
permutation  matrix,  with  l’s  on  the  anti-diagonal  and  0’s  elsewhere,  and  0  the  zero 
vector  of  dimension  q  —  1. 


23 


Prony’s  estimator,  or  the  LS  estimator,  of  the  AR  parameter  a,  denoted  by  aL$,  is 
defined  as  the  minimizer  of  the  sum  of  squared  errors  ||e||2  =  ||y  +  YQa||2.  A  simple 
calculation  shows  that  aLs  satisfies  the  following  normal  equations 

QTYTYQaLS  =  -QTYTy.  (2.8) 

As  will  be  seen  shortly,  the  q-by-q  matrix  QTYTYQ  is  almost  surely  nonsingular  if  n 
is  sufficiently  large.  Therefore,  Prony’s  estimator  aLs  can  be  explicitly  written  as 

aLS  =  -(QTYTYQ)"1QTYTy.  (2.9) 

As  compared  to  the  nonlinear  optimization  required  by  NLS,  the  estimator  aLs  is 
relatively  easy  to  compute.  The  computational  simplicity  is  one  of  the  reasons  that 
the  AR  approach  is  preferred  in  many  applications  where  fast  algorithms  are  desired 
for  frequency  estimation. 

In  the  literature,  there  is  an  alternative  method,  known  as  the  forward-backward 
linear  prediction  (FBLP),  which  is  claimed  to  work  better  than  straightforward  least 
squares  for  AR  estimation  (Kay  and  Marple,  1981).  Instead  of  minimizing  the  sum  of 
squared  forward  prediction  errors 

IMI2=  E  E ai  y*-i 

t=2q+l  j= 0 

the  FBLP  method  minimizes  the  sum  of  squared  forward  and  backward  prediction 
errors,  namely, 

2 

n  2q  n  2q 

E  Eai^-t  +  E  E  aj  Vt-2q+j 

t=2q+l  j=  0  t=2q+l  j=0 

In  our  case,  however,  since  the  AR  coefficients  a,-  are  symmetric,  the  backward  pre¬ 
diction  error  coincides  with  the  forward  prediction  error,  i.e., 

2  q  2  q 

y  aj  Vt-2q+j  ~  aj  Ut-j- 

j= 0  2=0 


24 


This  can  be  easily  verified  upon  changing  the  variable  j  in  the  first  summation  to 
2 q  -  j  and  then  using  the  symmetry  of  aj .  Consequently,  the  FBLP  method  produces 
the  same  AR  estimator  aLg  as  given  by  (2.9). 

2.3  Inconsistency  of  Prony’s  Estimator 

The  computational  simplicity  of  Prony’s  estimator  makes  it  attractive  in  many  ap¬ 
plications.  However,  the  estimator  has  been  found  to  be  inconsistent  for  frequency 
estimation  (Kay  and  Marple,  1981;  Dragosevic  and  Stankovic,  1989),  namely,  as  the 
data  length  n  grows,  aLs  does  not  converge  to  the  desired  AR  parameter  a. 

According  to  the  large  sample  theory  which  we  shall  present  in  Chapter  3  (see  also 
Li  and  Kedem,  1992),  for  any  s  and  r,  we  have  (see  Chapter  3,  Remark  3.8) 

n 

n~1  yt+rVt+s  E(yt+Tyt+S)  =  ryT_, 

t- 2«+l 

as  n  — »  oo.  Since  the  process  is  real-valued,  we  also  have  ryT  =  r'iT  for  any  r.  Using 
these  results,  it  is  easy  to  verify  that,  as  n  — ►  oo, 

n-1YTY  a—>'  Ry  and  n_1YTy  ^4-'  ry  +  ry 

where  Ry  and  ry  are  autocovariance  matrices  of  {yt}  as  defined  by 

r-i 

ry 
1  -2 

y 

-2qi 

respectively,  and  ry  :=  [ryq_i,r2i-2i  ” '  >ri]T  is  the  backward  rearrangement  of  ry. 
Notice  that  Ry  =  Rx  +  R£,  where  Rx  and  Re  are  autocovariance  matrices  of  { xt }  and 
{f;},  respectively,  with  the  same  structure  as  Ry.  It  follows  immediately  that  Ry  is 


25 


nonsingular,  since  R£  is  positive  definite,  by  Proposition  5.1.1  in  Brockwell  and  Davis 
(1987),  and  R*  is  at  least  non-negative  definite1. 

As  a  consequence,  Prony’s  estimator  aLs  can  be  written  as  (2.9)  almost  surely  for 
sufficiently  large  n,  and,  as  n  — ►  oo,  it  converges  almost  surely  to  a  deterministic  limit, 
as  specified  by 

aLs  -  -R;1^  (2.11) 

where 

Ry  :=  QTRyQ  and  ry  :=  QT(ry  +  rf)  =  2  QTry.  (2.12) 

Notice  that  the  q-hy-q  matrix  Ry  is  also  nonsingular  since  Ry  is  nonsingular  and  Q  is 
of  full  column  rank  q. 

The  limit  of  aLs  in  (2.11)  can  be  more  conveniently  expressed  in  terms  of  the  AR 
parameter  a  so  that  the  bias  of  aLs  could  be  easily  identified.  This  result  is  presented 
in  the  following  lemma. 

Lemma  2.2  Let  aj^s  be  Prony’s  estimator  of  a.  as  defined  in  (2.9).  Then,  as  n  — ►  oo, 

aLSa4  a  — R^R.a  +  r,)  (2.13) 

where  R£  and  re  are  defined  from  reT  in  the  same  way  as  Ry  and  ry  in  (2.12). 

Proof.  Denote  by  X  and  Z  the  matrices  defined  from  {a;t}  and  {et},  respectively, 
in  the  same  way  as  Y  in  (2.6).  Then,  we  can  write  Y  =  X  +  Z.  Moreover,  since 
et  =  A(z-1)  et,  namely, 

2  q 

=  (t  =  0,±l,±2, ...),  (2-14) 

j-o 

it  is  easy  to  show  that  the  error  term  e  in  (2.7)  can  be  represented  as 

e  =  ZQa  +  z 

1  In.  fact,  Rx  is  positive  definite,  as  we  shall  see  later. 


26 


where  z  is  defined  from  {et}  in  the  same  way  as  y  in  (2.6).  As  a  result, 

E( YTe)  =  E( YtZ)  Qa  +  E(YTz).  (2.15) 

Since  {xt}  and  {ef}  are  independent,  it  follows  that  E(XT Z)  =  0  and  E(XT z)  =  0. 
Therefore,  we  have 

E(  YtZ)  =  E(  ZTZ)  =  (n-2q)TL£ 

E( YTz)  =  E( ZTz)  =  (ra  —  2q)  (r£  +  rf ) 

where  Re  and  r£  are  autocovariance  matrices  of  {et}  defined  in  the  same  way  as  R9 
and  rH,  respectively.  Substituting  these  results  in  (2.15)  yields 

E(  YTe)  =  (n  -  2  q)  (R£Qa  +  r£  +  rf). 

This,  together  with  (2.5),  gives 

E( YTy)  =  -E(YtY)  Qa  +  E(YTe) 

=  (n  —  2q)  (— RyQa  +  R£Qa  +  r  £  +  rf). 

On  the  other  hand,  a  straightforward  calculation  shows  that  E(YTy)  =  (n  —  2 q)  (fy  + 
rf).  Thus,  we  obtain 

ry  +  rf  =  — RyQa  +  ReQa  +  rt  +  rf. 

By  definition,  ry  =  QT(ry  +  rf)  =  —  Rya  +  R£a  +  r£.  Substituting  this  expression  in 
(2.11)  proves  the  lemma.  <)■ 

Lemma  2.2  tells  us  that  the  almost  sure  limit  of  aLs  is  in  general  different  from 
the  AR  parameter  a  that  we  intend  to  estimate.  It  can  be  shown  that  the  bias 
-R^Rea  +  r£)  is  more  pronounced  when  the  signal-to-noise  ratio  is  not  sufficiently 
high.  For  example,  let  us  consider  (2.13)  in  the  cases  of  a  single  sinusoid  and  of  two 
sinusoids,  respectively. 


27 


Example  2.1  For  q  =  1,  we  have  aj  —  — 2COSO;!,  and  (2.13)  reduces  to 


»LS 


«i  +  2/>i 
ai - 7— - 

1+7 


where  p\  reT / rf,  is  the  autocorrelation  function  of  {et},  and  7  :=  rf} / r the  signal- 
to-noise  ratio.  Clearly,  the  bias  — ( a 1  +  2p\)/(l  +  7)  does  not  vanish  unless  7  — >■  00 
or  in  the  unusual  case  of  p\  =  cos  uq .  In  general  the  bias  is  inversely  related  to  the 
signal- to- noise  ratio  7.  Therefore,  in  order  for  Prony’s  estimator  to  provide  satisfactory 
results,  the  signal-to-noise  ratio  7  should  be  very  high  (e.g.,  30  dB),  as  reported  in 
many  papers  (see  Kay  and  Marple,  1981,  and  references  therein). 

Example  2.2  For  q  —  2,  it  can  be  shown  from  (2.1)  that 


ax  =  —  2 (cosoq  +  cos u>2)  and  a2  —  2 (1  +  2 cosu^  cosu>2)*  (2.16) 


When  the  noise  is  white,  we  have  rc  =  0,  and  thus  the  bias  in  (2.13)  becomes 


-1 

7  +  1  -pl  7 

«i 

(1  -  pi  +  p2)~/2  +  (2  +  p2)j  + 1 

-pi  7  (l  +  />2)7  +  l 

°2 

where  pT  prT  is  the  autocorrelation  function  of  {xt}.  As  we  can  see  again,  the 
bias  vanishes  if  7  — >  00.  On  the  other  hand,  when  7  — ►  0,  the  bias  tends  to  —a 
and  hence  the  limit  of  Prony’s  estimator  equals  0,  which,  by  (2.16),  corresponds  to 
cosa»i  =  -cosu;2  =  \/2/2,  or,  equivalently,  u>i  =  7t/4  and  u>2  =  37r/4.  This  explains 
why  Prony’s  frequency  estimates  tend  to  appear  around  uq  =  7t/4  and  u>2  =  37r/4 
when  the  signal-to-noise  ratio  is  low. 

Finally,  it  should  be  pointed  out  that  the  above  asymptotic  analysis  applies  not 
only  to  Prony’s  estimator  aLs,  but  also  to  any  AR  estimator  of  the  form  —  R-1r,  where 
R  and  r  are  some  consistent  estimators  of  Ry  and  ry,  respectively.  In  these  cases,  the 
limiting  expressions  (2.11)  and  (2.13)  always  hold,  and  hence  the  same  inconsistency 
persists. 


28 


2.4  High-Order  AR  Method 


A  way  to  reduce  the  bias  of  Prony’s  estimator  is  to  increase  the  order  of  the  AR  model. 
In  fact,  for  any  m  >  2 q,  let  B(z~x)  =  )P™ro2<?  bjZ~j  be  an  arbitrary  polynomial  in  z~x 
of  degree  m  —  2q  with  b0  =  1.  Then,  it  follows  from  (2.3)  that  B(z~x)  A(z~x)  xt  =  0 
for  all  t.  Clearly,  the  product 

C{z~x)  :=  B{z~l)  A(z-X)  (2.17) 

is  a  polynomial  in  z~x  of  degree  m,  i.e.,  C(z~x )  =  Y^=o  ciz~* i  an(l  co  =  1.  This  implies 
that  {xt}  satisfies  the  following  high-order  AR  equation 

m 

Ec;^-;  =  0  (t  =  0, ±1, ±2, . . .)  (2.18) 

j= 0 

where  m  >  2 q.  Notice  that  (2.18)  reduces  to  (2.3)  when  m  =  2 q.  In  general,  when  m  > 
2q ,  the  AR  coefficients  Cj  in  (2.18)  are  no  longer  symmetric,  since  B(z~l)  is  arbitrary, 
and  hence  the  zeros  of  the  AR  polynomial  C{z~l)  are  not  necessarily  reciprocal  pairs 
as  in  the  case  of  m  —  2 q.  Nevertheless,  C(z~x)  has  at  least  2 q  zeros  on  the  unit  circle 
\z\  =  1  which  coincide  with  the  zeros  of  A(z~l)  that  determine  the  frequencies  uk. 
This  gave  rise  to  the  idea  of  estimating  the  AR  coefficients  cj  without  the  restriction 
of  symmetry.  When  estimates  of  c,-  become  available,  the  frequency  estimates  can  be 
obtained  either  from  the  2 q  zeros  of  the  (estimated)  AR  polynomial  C'(z~1)  which  are 
on  or  closest  to  the  unit  circle,  or  from  the  2 q  maxima  in  the  (estimated)  AR  spectrum 
|C(exp(— (Lang  and  McClellan,  1980;  Kay  and  Marple,  1981). 

Strictly  speaking,  the  c,  should  be  restricted  to  those  which  admit  the  factorization 
(2.17)  in  order  to  guarantee  that  the  resulting  C{z~v)  has  at  least  2 q  zeros  on  the  unit 
circle.  In  the  existing  high-order  AR  methods,  however,  an  unconstrained  AR(m) 
model  is  usually  used,  perhaps  due  to  the  lack  of  a  convenient  description  of  this 
restriction  in  terms  of  the  Cj. 


29 


Now  consider  the  LS  estimator  cls  that  minimizes  the  criterion 


E 

tz=m+ 1 


i= o 


Owing  to  the  strong  ergodicity  of  {y4},  it  can  be  shown  as  before  that  cls  converges 
almost  surely  to  a  deterministic  limit.  For  white  noise,  the  limit  can  be  written  as 
(Stoica,  Friedlander,  and  Soderstrom,  1987) 


c  +  0(m  2) 


where  the  m-vector  c  is  the  minimum-norm  solution  of  the  Yule- Walker  equations 
corresponding  the  autocovariances  of  {xt}.  The  minimum-norm  solution  c  has  some 
very  interesting  properties.  For  example,  it  was  shown  (Tufts  and  Kumaresan,  1982; 
Stoica,  Friedlander,  and  Soderstrom,  1987)  that  the  corresponding  AR  polynomial 
C(z~1)  can  be  factorized  as  (2.17)  with  all  the  zeros  of  B{z~l)  appear  strictly  inside 
the  unit  circle.  This,  together  with  the  fact  that  the  difference  between  the  limit,  of  cls 
and  c  vanishes  as  m  increases  without  bound,  indicates  that  for  sufficiently  large  n  and 
m  the  high-order  AR  method  produces  satisfactory  frequency  estimates  by  locating 
the  2 q  zeros  on  or  closest  to  the  unit  circle  in  the  AR  polynomial  corresponding  to  the 
estimator  cls- 

In  a  recent  paper  by  Makisack  and  Poskitt  (1989),  it  was  shown  for  the  simplest 
case  of  a  single  sinusoid  ( q  —  1)  in  additive  white  noise  that  the  high-order  AR  method 
leads  to  a  consistent  frequency  estimate  as  the  order  m  of  the  AR  model  increases  at 
a  certain  rate  (e.g.,  m6/n  — »  oo  and  m2 fn  —*■  0)  along  with  the  data  length  n.  For 
short  data  records,  it  was  suggested  that  m  be  chosen  between  n/3  and  n/2  in  order 
to  obtain  satisfactory  results  (Kay  and  Marple,  1981,  and  references  therein). 

The  weakness  of  the  high-order  AR  method  is  that  many  spurious  zeros  (or  spurious 
peaks  in  the  AR  spectrum)  are  introduced  as  the  order  of  the  AR  model  increases  (Kay 
and  Marple,  1981,  and  references  therein).  In  some  cases,  especially  when  the  signal- 
to-noise  ratio  is  low,  it  could  be  difficult  to  identify  the  zeros  (peaks)  corresponding 


30 


Figure  2.1:  Plot  of  high-order  AR  log-spectrum  for  the  same  data  with  n  —  100  as 
in  Figure  1.2.  The  SNR  is  3  dB  per  sinusoid,  (a)  For  m  =  25,  the  well-separated 
frequencies  are  resolved  but  the  closely- spaced  frequencies  are  not.  (b)  For  m  =  50,  all 
the  frequencies  are  resolved,  but  a  large  number  of  spurious  peaks  may  cause  difficulties 
in  the  calculation  of  the  frequency  estimates. 


to  the  sinusoidal  signal  from  a  large  number  of  zeros  (peaks)  in  the  AR  polynomial 
(spectrum).  Figure  2.1  illustrates  the  performance  of  the  high-order  AR  method.  In 
this  figure,  the  AR  log-spectrum  from  the  Burg  estimator  (Kay  and  Marple,  1981) 
was  plotted,  as  a  function  of  the  normalized  frequency  /,  for  the  same  data  of  four 
sinusoids  in  additive  white  noise  with  n  —  100  as  in  Figure  1.2.  It  shows  that  closely- 
spaced  frequencies  can  be  resolved  and  satisfactory  frequency  estimates  be  obtained 
when  the  order  m  is  sufficiently  high.  On  the  other  hand,  it  is  also  evident  that  the 
spurious  peaks  presented  in  the  AR  spectrum  may  cause  difficulties  in  the  calculation 
of  the  global  maxima  in  order  to  obtain  the  frequency  estimates. 

Notice  that  cls  can  be  written  as  cls  =  -R-1r  where  R  is  the  m-by-ra  sample 
autocovariance  matrix  of  {yt}-  In  the  noiseless  case,  the  corresponding  sample  autoco¬ 
variance  matrix  is  defined  from  { xt }  and  can  be  shown  to  be  of  rank  q.  Making  use  of 
this  property,  one  can  apply  the  principal  component  (PC)  analysis  and  approximate 
the  “noisy”  matrix  R  from  {y(}  by  a  rank-g  matrix  R  corresponding  to  the  principal 
eigenvalues  (Tufts  and  Kumaresan,  1982;  Kay,  1988).  The  resulting  AR  estimator  is 
given  by  cpc  =  —  Rlr  where  Rl  is  the  pseudo-inverse  of  R.  Simulations  have  shown 
that  the  PC  method  is  able  to  improve  the  original  least  squares  estimator  cls  and 
to  produce  very  good  estimates  when  the  signal-to-noise  ratio  is  high  or  moderate 
(Kay,  1988).  This  is  not  surprising  since  the  PC  method  cleans  the  noise  with  the 
help  of  principal  component  analysis,  while  the  least  squares  merely  tries  to  explain 
the  noise  with  extra  poles  in  the  AR  spectrum.  The  difficulty  of  the  PC  method,  how¬ 
ever,  is  that  a  singular- value  decomposition  has  to  be  used  for  principal  component 
analysis  and  hence  the  computational  burden  could  be  too  high  for  some  applications, 
especially  when  m  is  large,  as  required  for  efficient  estimates  (Kay  and  Shaw,  1988). 
Moreover,  in  the  case  of  low  signal-to-noise  ratio,  the  PC  method  is  not  as  efficient  as 
other  procedures  which  employ  linear  filters  for  noise- cleaning  (Kay,  1988;  Dragosevic 
and  Stankovic,  1989). 


32 


2.5  Nonsingularity  of  Autocovariance  Matrix 


Before  ending  this  chapter,  we  would  like  to  show  that  the  autocovariance  matrix  Rx 
of  the  signal  {zt}  is  positive  definite  and  hence  nonsingular.  This  result  is  presented 
in  the  following  lemma. 

Lemma  2.3  Let  R*  be  the  autocovariance  matrix  of  {xt}  with  the  same  structure  as 
Ry  in  (2.10).  Then,  R*  has  full  rank  2q  —  1  and  can  be  decomposed  as 

Rx  =  SPSH 

where  P  is  a  2q-by-2q  diagonal  matrix  of  full  rank  2 q  and  S  a  (2 q  —  l)-by-2q  Van¬ 
dermonde  matrix  of  full  rank  2q  —  1.  The  superscript  H  stands  for  the  Hermitian 
transposition. 

Proof.  Let  us  first  extend  the  notation  f3k  and  uk  for  k  =  q  +  1, . . .,  2q  by  defining 

Pk  ■■=  fcg-k+i  and  uk  :=  -u2g-k+i  (k  =  q  +  1, . . . ,  2q). 

Then,  the  autocovariance  function  r*  can  be  written  as 

rT  =  2 Pi  ™S(“kT)  =  4 Pi  zl  (2-19) 

*=1  *=1 

where  zk  :=  expffw*.),  (k  =  1, . . . ,  2q).  Consider  the  2g-by-2g  diagonal  matrix 

P  :=  4  diag(/?j, . . . ,  0lq).  (2.20) 

Clearly,  P  is  nonsingular  since  Pk  >  0  for  all  k.  It  is  easy  to  verify  from  (2.19)  and 
the  definition  of  Rx  in  (2.10)  that  Rr  =  SPSff  where 

S  :=  [si,  —  ,s2?]  and  sk  :=  [1,  zk,  ■  •  ■  ,  z2kg~2]T .  (2.21) 

To  show  that  S  has  full  (row)  rank  2 q  —  1,  we  first  note  that  SH  can  be  written  as 

S"  =  [c0,---  ,c2?_2] 


33 


where  c j  :=  [z±3 ,  •  •  •  , z^j],  (j  =  0, . . .,2q  —  2).  Suppose  that  there  exist  constants  pj 
such  that  E2jl~o2PjCj  =  Then,  it  implies  that  $(^1)  =  Y^tTo'2 Pj zt*  =  0  for  all 
k  =  q,  namely,  the  polynomial  4>(z)  :=  YjYo2 Pj  z*  lias  2 q  distinct  zeros  z^1 

while  its  degree  is  at  most  2 q  —  2.  It  is  only  possible  when  pj  =  0  for  all  j.  As  a 
result,  the  c j  are  linear  independent  and  hence  the  rank  of  S  is  equal  to  2 q  -  1.  The 
nonsingularity  of  Rx  follows  immediately.  <C> 

Remark  2.1  Since  Q  has  full  column  rank  q,  Lemma  2.3  implies  that  Rx  is  non¬ 
singular  and  can  be  decomposed  as  R*  =  QTSPSi/Q. 


34 


Chapter  3 


Limit  Theorems  of  Sample  Auto  covariance 
Function 


In  this  chapter,  we  investigate  some  limiting  properties  of  the  sample  autocovariance 
function  (SACF)  of  the  process  {yt}  in  (1.1)  concerning  its  consistency  and  asymp¬ 
totic  distribution.  In  particular,  we  provide  a  central  limit  theorem  for  the  sample 
autocovariance  function,  and  consider  the  uniform  consistency  of  the  sample  autoco¬ 
variance  function  after  parametric  filtering.  Since  { yt }  has  a  mixed  spectrum  —  its 
spectrum  consists  of  a  discrete  part  corresponding  to  the  sinusoids  and  a  continuous 
part  corresponding  to  the  noise  —  the  central  limit  theorem  in  this  chapter  extends 
the  classical  results  for  a  time  series  with  continuous  spectrum.  The  consistency  of  the 
sample  autocovariance  function  of  {yt}  has  been  used  in  the  engineering  literature  for  a 
long  time  without  rigorous  proof.  Therefore,  the  results  on  uniform  consistency  of  the 
sample  autocovariance  function  after  parametric  filtering  not  only  fill  this  gap  but  also 
provide  some  insight  into  the  effect  of  parametric  filtering  on  the  sample  autocovari¬ 
ance  function  of  a  time  series  with  mixed  spectrum.  The  limit  theorems  developed  in 
this  chapter  lay  the  foundation  of  the  asymptotic  analysis  for  the  parametric  filtering 
method  of  frequency  estimation  which  we  shall  present  in  the  next  chapter. 


35 


3.1  Asymptotic  Normality  of  SACF 


Given  a  finite  sample  {yk,. .  .,yn}  from  the  random  process  in  (1.1),  let  us  consider 
the  sample  autocovariance  function  (SACF)  as  defined  by 

n-j 

rj  :=  n-'Y^yt+jVt  (j  =  0,1,... ,p)  (3.1) 

t  =  1 

where  p  is  a  fixed  integer  with  0  <  p  <  n.  In  this  section,  we  would  like  to  show  that 
the  fj  are  asymptotically  jointly  normal  as  n  — ►  oo. 

For  simplicity,  let  us  first  establish  the  normality  for  the  similar  quantities 

n 

:=  n~1Y,Vt+iyt  U  =  0,1, ...,p),  (3.2) 

t= i 

and  then  prove  the  asymptotic  negligibility  of  the  differences  fj  —  fj. 

We  assume  for  the  time  being  that  the  phases  </>k  of  the  sinusoids  are  constants. 
This  assumption,  however,  does  not  alter  the  following  asymptotic  theory.  In  fact, 
assuming  constant  <f>k  is  equivalent  to  considering  the  conditional  properties  of  the 
SACF  given  (f>k.  As  we  shall  see,  the  conditional  asymptotic  distribution  of  fj  given  <f>k 
does  not  depend  on  <f> *,  and  therefore  coincides  with  the  the  unconditional  distribution. 

The  following  lemma  shows  that  the  covariance  of  f,-  and  fj  converges  to  a  finite 
limit  at  the  rate  of  n~l . 

Lemma  3.1  Suppose  that  E(^)  —  <  oo  where  {£<}  is  the  i.i.d.  sequence  in  (1.2). 

Then,  it  follows  that 

lim  n  cov(f;,  fj)  =  lim  n  E{(fi  —  rf )  (fj  —  rj)}  = 

n— >-oo  n — ►oo  J  J 

where  Oij  is  finite  and  can  be  written  as 

q  oo 

<Tij  :=  Yl  2/3*  cos(w*f)  cos(cjfcj')  ^  rcT  cos(ukr) 

Aj  =  1  T—  —  oo 

oo 

+  (k  -  3 )rfrj  +  Y  (rrK+i-j  +  rr+iK-j)-  (3.3) 

T  — —  OO 


36 


Proof.  Using  the  trigonometric  identity  (1.14),  it  is  easy  to  show  that 

cos(a )kt  +  <j>k)  cos(uk(t  +  j)  +  4>k)  =  l{cos(a ;kj)  +  cos(uk(2t  +  j)  +  2^)}. 
Therefore,  from  (1.1),  we  can  write  fj  as 

fj  =  r)  +  +  (2ra)-1^/3|^cos(wt(2t  +  y)  +  2^jk) 

t  =  l  k  =  1  t= 1 

q  n 

+  n~l  Y  PkPk'Y2cos(“kt  +  tk)cos(U}k'(t  +  ti  +  <f>k'')  (3-4) 

k,k'=l  t=1 

k^k' 

where  rj  is  given  by  (1.3)  and  (1.4),  and  Qt  is  defined  by 

Cjt  •—  xt€t+j  +  xt+j£t  +  €t+jet  —  rj-  (3.5) 

Since  cosu>  =  { exp(iui )  +  exp(— iu)}/2,  it  is  not  difficult  to  show  that 

n 

Y cos(ajjfc(2 1  +  j )  +  2(f>k)  <  \  sina;*!-1. 

i= i 

Similarly,  using  the  trigonometric  identity  (1.14),  we  obtain 

n 

Y  cos (u>kt  +  <t>k)  cos(u )k,(t  +  j )  +  <f>k') 

t= 1 

<  |{|  Sin((wjb  +  uk')/2)\~1  +  I  sin((u>*  -  uk,)/2)\~1} 

for  any  <f>k ,  j,  and  k  ^  k' .  Since  uk  £  (0, 7r)  for  all  k ,  there  exists  a  constant  K  >  0 
such  that  |  sinwjtl-1  <  K  and  |  sin((w*:  ±  u?*/)/2)|— 1  <  K  for  all  k  ^  k' .  This,  together 
with  (3.4),  implies  that 

fj  =  r]  +  n~l  Y  Cjt  +  0(n_1).  (3.6) 

t  =  l 

Moreover,  since  E((jt )  =  0,  we  also  have 

E(fj)  -ryj  +  0{n~l). 

Combining  these  results  yields 

lim  ncov(f,-,f,)  =  lim  n  E{(fi  —  rf)  (fj  —  rf)} 

n— foo  v  ,/  n-+oo  J 

(n  n  \ 

t= i  t= i  / 


37 


To  proceed  with  the  proof,  we  note  that  {x(}  is  deterministic  under  the  constant  phase 
assumption.  Therefore,  from  (3.5),  we  obtain 


t=i  <= i 


1  cov  =  «_1  E  E(0t(js)  =  h+h+h 


where 


h  :=  n  1  ^  (x<x,rt£_J+i_i  +  xix,+irt£_JI+i 

t,4  =  l 

+  xt+ix,r\_s_j  +  xux.+jT\_t) 

:=  T1+T2  +  T3  +  T4 

n 

h  :=  n_1  E  {*tc(<-s  +  i,j)  +  x1+ic(f-s,j) 

t,  4  =  1 

+  x,c(s  -t  +  j,  i )  +  x,+ic(s  -  t,  i)} 


h  :=  n  xcov  Ef*+‘£*,E€*+f€* 


t=i  t=i 


and  c(u,v )  :=  JS(6t+ue<+„et)  is  the  third-order  cumulant  function  of  {et}.  Using  the 
substitution  r  :=  t  —  s  and  the  trigonometric  identity  (1.14),  we  can  write  Tx  as 

Ti  =  E  ¥1  E  rUi-i  {  f1  ~  cos(u>*r)  +  n~l  E  cos(w*(2t  -  r)  +  2^)1 

t  =  1  |r|<n  1  J 

« 

+  E  Mr  E  rT+«-j  W  -E  cos (ukt  +  <f>k)  cos (uk'(t  -  r)  +  <t>k <) 

k,k'= 1  M<n  <€£> 

k^k' 

where  D  :=  {t  :  max(l,r  +  1)  <  t  <  min(n,  r  +  n)}.  Clearly,  for  any  r  and  k  ^  k\ 
the  two  summations  over  t  £  D  are  bounded  in  absolute  value  by  n.  In  addition,  it  is 


easy  to  show  that 


_1  E  cos(w*(2t  -  r)  +  2 <f>k)  ->  0 


'l  E  cos(ukt  +  <j>k)  cos(u?k,(t  -  r)  +  <f>k.)  -k  0 
t£D 


38 


as  n  — *  oo  for  any  r  and  k  ^  k' .  Since  £)|r£|  <  °° ,  it  follows  from  the  bounded 
convergence  theorem  that 

q  oo 

ri  Y\Pl  Y  rUi-i  cos(w4r) 

k  =  l  r=— oo 

<?  oo 

=  Eli?E  rr  cos(cjfc(r  -  t  +  j)) 

/t  =  l  T=  —  oo 

as  n  — ^  oo.  Similarly,  we  obtain 

q  oo 

T2  -*•  £2$  5E-r‘coB(&;*(r-t-j)) 

fc  =  l  r=— 00 

?  00 

T3  -*•  Yffi  Y  r?cos(u ;»(r  +  *+i)) 

Jb=l  r=— 00 

<7  00 

T4  -+  EiA1  £  r‘cos(u*(r  +  i-  j)). 

i  =  l  T—  —  00 

Since  the  symmetry  of  implies  that  5Z^=-oo  rr  sin(uttr)  =  0,  adding  up  these  ex¬ 
pressions,  followed  by  an  application  of  (1.14),  yields 

q  00 

h  -^Y2^kcos(uki)cos(ukj)  Y  rT  cos(^r). 

fc  =  l  T=  —  00 

Furthermore,  it  is  easy  to  verify  from  (1.2)  that 

OO 

c(t  +  U,v)  =  E(€T+Uev€0)  =  E($)  Y  ^j+r+u^j+v^j- 

j  —  —  00 

and  hence  Y  |c(r  +  «,t>)|  <  00  for  any  fixed  u  and  v.  Following  an  argument  similar  to 
the  proof  of  Tx  leads  to  I2  — >  0.  Finally,  according  to  the  classical  results  (Brockwell 
and  Davis,  1987,  Proposition  7.3.1),  we  obtain 

OO 

3 Yj]  +  Y  (KK+i-j  +  K+iK-j)- 

T  =  — OO 

The  assertion  follows  immediately  upon  combining  the  limits  of  Ix,  J2,  and  I3.  0 

Remark  3.1  As  can  be  seen  from  (3.3),  the  asymptotic  covariance  <r,j  consists  of 
two  parts.  The  first  part  —  namely,  the  first  term  in  (3.3)  —  is  completely  due  to  the 
presence  of  sinusoids  in  {yt},  while  the  second  part,  i.e.,  the  last  two  terms  in  (3.3), 


39 


comes  from  the  classical  results  for  a  time  series  with  continuous  spectrum  (Brockwell 
and  Davis,  1987,  Proposition  7.3.1). 

Using  Lemma  3.1,  we  now  present  the  central  limit  theorem  (CLT)  for  fj  in  (3.2). 

Theorem  3.1  Assume  that  E(tf)  =  <  oo.  Then,  n1/2(fj  -  rj),  (j  =  0,1,. .  .,p), 

are  asymptotically  jointly  normal  with  mean  zero  and  covariance  matrix  (i,  j  = 
0, 1, . .  .,p),  where  is  defined  by  (3.3). 

Since  the  proof  of  Theorem  3.1  is  long  and  tedious,  we  break  it  up  into  a  series 
of  lemmas.  For  this  purpose,  we  note  that  equipped  with  the  Cramer- Wold  device 
(Brockwell  and  Davis,  1987,  Proposition  6.3.1),  all  we  need  is  to  show  that  for  any 
(A j,j  =  0,1,..., p}  ^  {0},  it  holds  that  ra1/2  Xj(fj  -  rj)  N(0,<r2),  where 


<72  :=  £  >  0. 

i,i=0 

This,  according  to  (3.6),  can  be  accomplished  by  showing  that 

.5  n(o,*!) 


t=  1 


where 


C*:=£AiO< 

j=o 


(3.7) 


(3.8) 


(3.9) 


and  Qt  is  defined  by  (3.5). 

We  first  consider  the  case  where  only  finite  many  tpj  in  (1.2)  are  nonzero.  The 
following  lemma  shows  that  in  this  case  the  sum  in  (3.8)  is  asymptotically  equivalent 
to  a  martingale  (i.e.,  a  sum  of  martingale  differences). 

Lemma  3.2  Suppose  that  <  oo,  and  that  the  sequence  {ipj}  in  (1.2)  has 

a  finite  length,  i.e.,  tjjj  =  0  for  all  \j\  >  to,  with  m  being  a  positive  integer.  Then, 

E(,=EM,+0,(1) 

t= 1  t-l 

where  {Mt}  is  a  martingale  difference  sequence  with  respect  to  the  filtration  Tt  gener¬ 
ated  by  {£,,«  <  t}  for  t  >  0. 


40 


Proof.  Notice  that  from  (3.5)  we  can  write 


E  Qt  —  Ji  +  J2  +  J3 

<= 1 

where 


Ji 


n 


t= 1 


n  n 

J2:=Yx*+i€t'  and  :=  "  rj)' 

t=i  t=i 


By  definition,  ct  =  £V>U6-U-  Therefore,  we  have 


(3.10) 


m  n 

J3  =  XI  V’u^  -ffes*-v-j) 

u,  v——m  tz=  1 

m  j-u-j-n 

ti,  vzz—m  t =7  —  u  -f  1 

where  /)„  is  Kronecker’s  delta  function,  i.e.,  <5U  =  0  for  m  ^  0  and  <50  =  1,  and  the 
second  equality  is  obtained  by  replacing  t  +  j  —  u  with  t  in  the  first  expression.  Given 
u  and  v,  the  variance  of  £t£t+u-v-j  ~ &tf>u-v-j  is  finite  and  independent  of  t.  Therefore, 
any  weighted  sum  of  these  quantities  over  a  finite  region1  of  ( t ,  u,  v )  can  be  written 
as  0^(1).  Armed  with  this  fact,  we  add  and  subtract  a  finite  number  of  — 

in  the  last  expression  of  J3  and  obtain 


h  =  53  53 (66+«-»-i  “  +  Op(l). 


u,  v=—m 


t= 1 


Moreover,  using  the  substitution  r  =  —  (u  —  v  —  j)  and  the  fact  (see  equation  (1.4)) 


=  E  ipv-T+ji’v,  we 

can  also  write 

2m+i 

n 

J3  =  E  E 

r=— 2m+i 

<=i 

/  -1 

2m+j  \  n 

=  ^  2  E  +  E  rr-i  E(&&-  -  +  <M!) 

\  r=-2m+j  T— 0  /  t= 1 

Replacing  r  with  —  r  and  then  substituting  t  +  T  by  /,  the  first  term  in  (3.11)  becomes 

2m— j  n  2m— j  r+n 

of  53  rT+j  =  af  E  rT+i  E 

T=  1  «  =  1  T  —  1  t  =  7-+l 


1  Namely,  the  size  of  the  region  is  finite  and  independent  of  n. 


Therefore,  by  adding  and  subtracting  a  finite  number  of  in  the  last  expression, 

we  can  write  the  first  term  in  (3.11)  as 

2m  -j  n 

E  <+iE&6-T+0p(l).  (3.12) 

r= 1  t=l 

Notice  that  =  0  for  r  >  2m  +  j  and  reT+i  =  0  for  r  >  2m  —  j.  As  a  consequence, 
we  can  extend  the  summations  in  (3.11)  and  (3.12)  to  0  <  r  <  oo  and  write 

n  oo 

Js  =  £  £  -  *f*r)  +  Op(l)  (3.13) 

t=l  r=0 

where 


:=  r)/cr|,  and  BjT  :=  (reT+j  -(-  r(T_j)fa\  for  r  >  0.  (3.14) 

Note  that  Bjr  =  0  for  r  >  2m  +  p.  Similarly,  A  can  be  expressed  as 

m  n  m  j  — u+n 

•a =  e  ^  ^  £.t+j~uxt = 

u=— m  t— 1  u— — m  i—j— u+1 

where  the  second  equality  is  obtained  by  substituting  t  +  j  —  u  with  t.  Since  the 
variance  of  £txt+u_j  can  be  bounded  by  a  constant,  an  argument  analogous  to  the  one 
we  employed  earlier  leads  to  the  representation 


m  n 

Ji  =  E  ^«  £&**+«-;  +  °p(.1) 

u=—m  t  —  1 

n  m 

=  £&  E  i’uXt+u-j  +  Opil).  (3.15) 

tz=  1  u=—m 

In  a  similar  way,  we  can  write  J2  as 


J2  —  £&  E  +  Op(l). 

t=l  u—-m 

This,  in  connection  with  (3.13),  (3.15),  and  (3.10),  gives  the  following  representation 

n  n  (  00  ^ 

E  0i  =  E  ^  +  E  BJrm~r  -  VpT)  +  Op(l)  (3.16) 

t= 1  t=l  l  r=0  J 

where 


m 

.=  'y )  ipu(%t+u+j  4"  %t+u— j)- 

u=—m 


(3.17) 


42 


Now  let  us  define 


Mt  :=  At 6  +  £  5x(66-r  -  <^r)  (3.18) 

r=0 

with  At  :=  and  BT  ■=  J2^jBjT-  Then,  from  (3.9)  and  (3.16),  we  obtain 

£<<  =  X>  £<j<  =  X><  +  OpM- 

t= i  i=o  t=i  t= i 

Since  the  A’s  and  IPs  are  constants,  it  can  be  easily  shown  that  {Mt}  is  a  martingale 
difference  sequence  with  respect  to  the  filtration  Tt  generated  by  {£s,s  <  t}.  In¬ 
deed,  the  measurability  of  Mt  with  respect  to  J~t  is  obvious  from  the  definition  (3.18). 
Furthermore,  we  have 

OO 

E(Mt\Tt-\)  =  AtE((t)  +  B0E(et  -  a2)  +  £i?T&_T£(£()  =  0. 

T  —  l 

The  lemma  is  thus  proved.  O' 

As  indicated  by  Lemma  3.2,  the  central  limit  theorem  (3.8)  can  be  established  if 
we  can  prove  the  same  result  for  the  martingale  difference  sequence  {Mt}.  To  this 
end,  we  consider  the  quantities 

K2  :=  YE(M?\Bt-i)  and  s2n  :=  E(V2).  (3.19) 

t= i 

The  following  lemma  shows  that  both  n~lV2  and  n~ls2n  converge  to  the  same  limit  a2 
as  n  — >  oo. 

Lemma  3.3  Assume  that  the  conditions  in  Lemma  3.2  are  satisfied,  and  let  V2  and 
be  defined  by  (3.19).  Then,  as  n  —*■  oo,  n~xV2  —*■  a2,  n~2s2n  — ►  a2 ,  and  hence 
V2/s„  —*■  1,  where  o2  is  given  by  (3.7). 

Proof.  From  (3.18),  it  is  easy  to  show  by  straightforward  computations  that 

OO 

E(M2 \Ft-i)  =  a2 A2  +  2B0E(e)  At  +  2 a2  Y  BrAt(t.r  +  B2E(e  -  erf)2 

T— 1 

OO  OO 

+  2B0E{e)  Y  BrZt-r  +  Y  B.Brtt-tt- r.  (3.20) 

T- 1  J,T=1 


43 


To  verify  that  n  1Fn2  converges  to  cr2,  we  first  note  that  for  any  fixed  r  and 

n  q 

n_1  E  Xt+rXt+,  E  I/3*  COS MT  ~  S ))  =  »?-. 

t=l  fc=l 

as  n  -*  oo.  Therefore,  from  (3.17)  and  (3.21),  we  can  write 

n-'t* 

t=l 

n 

=  X]  n_1  E^  +  zt+u-.-x^t+u+j  + 

t,  j  U.tl  t= 1 

EA'‘AiE^^(C,+j-j  +  rl-v+i+i  +  i  +  r*_„_<+i). 

i,j  u,v 

For  any  u>,  it  is  easy  to  verify  that  £2  ipuip v  sin(u;(w  —  v ))  =  S{|  J2  V'u  exp(iu;i 

and,  from  (1.4), 

OO 

X]  'fiuA  COs(u(u  -  v))  =  (T?-2  X;  K  COS (ut). 

U,V  T  =  —  00 

Therefore,  it  follows  that  for  any  s, 

E^«r«-«+*  =  E^E^*008^®”  v  +  s)) 

u,v  A:  — 1  u,v 

q  oo 

=  cr f*E  2$fecos(w*s)  XI  rr  cos(w*r). 


jfe=l 


Using  this  expression  in  (3.22)  yields 


n 


n 


1  E  A2t  ai  2  XI  i  E  cos(w*0  cos(w*i)  E  cos(w*r) 

1 ,  j =o 


t=i  »,i=o  tifc=i 

-1 


Similarly,  since  n  1  x(+J  — >■  0  for  any  5,  it  follows  from  (3.17)  that 

n  n 

n~l  E^j  -  EAJ  E^u  n_1  E(x‘+«.+i  +  x<+«-i)  -*■  °- 

t  =  l  j  U  t  =  1 

In  addition,  it  can  be  shown  (An,  et  a/.,  1983)  that 


-i 


X)cos(w(t  4- s) +  <£)&. 


<  =  1 


=  Op  ^\/n_1  log  ”  )  0 


for  any  fixed  w,  0,  5,  and  r.  This  implies  that  for  any  s  and  r, 


n 


-1 


E  Xt  +  s£t-T  —*■  0 


t  =  1 


(3.21) 


(3.22) 

)l2}  -  0 


(3.23) 

(3.24) 


44 


as  n  — ►  oo.  Therefore,  for  any  r,  we  obtain 


n  1  X  AtCt-r  =  X  Xi  X  71  1  X(x‘+“+i  +  a;<+«-j)6-r  4  0.  (3.25) 

t  =  l  j  U  t  —  1 

Finally,  by  the  law  of  large  numbers,  we  have 

n  n 

n”1  Y  6- r  4  0  and  n_1  X  6-*6-r  4  <ST_s 
<=i  <=i 

for  any  s  and  r.  This,  together  with  (3.23),  (3.24),  (3.25),  and  (3.20),  implies  that 

n"X2  4 

+  B20E(e~^)2  +  ^YBr- 

Using  (3.14),  it  can  be  easily  verified  that 

OO 

B*E{e-0*)*+V*YB'r 


X  A‘Ai  \  X  2#t  cos(Wfci)  cos(u>kj)  Y  reT  cos (u>kr) 

tyj—  0 


<k= 1 


T—  1 


T  —  1 


=  X  I  (K  -  3Kri  +  X  (44+,— i  +  K+iK-j)  \  •  (3.26) 

i,j=0  \  T—  —  oo  J 

Thus  we  have  proved  that  n-1U„2  4  <j2.  To  show  that  n_1s2  — *•  cr2,  we  note  from 
(3.20)  that 


n 


'X  =  n_1  X  E{E{Mt\E (_i)} 

t  =  l 

n  oo 

=  «\  »-‘£4  +  jW  -  *?)2  +  <.(4 E 


<=1 


r=l 


The  assertion  follows  immediately  from  (3.23)  and  (3.26).  § 

Armed  with  these  lemmas,  we  claim  in  the  following  lemma  that  (3.8)  holds  when 
{->pj}  has  a  finite  length. 


Lemma  3.4  Suppose  that  E(£f)  —  <  oo,  and  that  the  sequence  {ifij}  in  (1.2)  has 

a  finite  length.  Then,  the  central  limit  theorem  (3.8)  holds  as  n  —>■  oo. 


45 


Proof.  By  Lemma  3.2,  it  suffices  to  show  that 

n-wf^Mt  ^N(0,a2).  (3.27) 

f=l 

As  Lemma  3.3  has  been  proved,  all  we  need,  according  to  the  central  limit  theorem  for 
martingales  (Brown,  1971),  is  to  verify  the  Lindeberg  condition  (Brown,  1971,  eq.  2) 

s^j2E{M?I(\Mt\>est)}-+0 

t= i 

as  n  — ►  oo,  where  /(•)  is  the  indicator.  Since  n~ls^  — ►  a 2,  as  shown  in  Lemma  3.3,  the 
Lindeberg  condition  is  equivalent  to 

n 

ra_1  ^2  >  £s,)}  -+  0. 

(=i 

Therefore,  it  suffices  to  verify  that  E{M2I(\Mt \  >  £st)}  — ►  0  as  t  — ►  oo  for  any  e  >  0. 
To  this  end,  we  note  that  \At  \  <  A  for  some  A  >  0  and  all  t.  Therefore, 

OO 

\Mt\  <  Ut  :=  A\t,\  +  'E'Brfrtt- r  ~  o$6r) 

T=0 

Moreover,  the  convergence  of  t~ls2t  to  a2  >  0  as  t  — ►  oo  implies  that  st  >  e  t1/2  for 
small  s  >  0  and  large  t.  Combining  these  results,  we  can  write 

E{M?I(\Mt\  >  «,)}  <  E{U?I(Ut  >  eH1'2)} 

-  E{UqI(Uo  >  s2t1/2)}  0 

as  t  — *  oo,  where  the  equality  is  due  to  the  stationarity  of  {Ut}  and  the  limit  to  the 
finiteness  of  E(Uff).  Applying  Theorem  2  of  Brown  (1971)  proves  (3.27),  and  hence 
the  lemma.  ■$> 

We  now  complete  the  proof  of  Theorem  3.1  by  showing  that  (3.8)  also  holds  if  {tftj} 
has  an  infinite  length. 

Proof  of  Theorem  3.1.  When  {t/jj}  is  of  infinite  length,  the  central  limit  theorem 
(3.8)  can  be  obtained  by  following  the  proof  of  Proposition  7.3.3  (Brockwell  and  Davis, 


46 


1987).  In  fact,  for  any  m  >  0,  let  us  define 

m 

C  ■■=  £  r”  :=  £(C,C) 

j=-m 

(P  ■=  xtq  +  zt+j<?  +  <?+j<?-if,  and  C«  :=  EA 

j=0 

Lemma  3.4  guarantees  that  for  any  fixed  m, 

n-1/2i:c  ”  S«~N(0,O- 

(=1 

as  n  — ♦  oo,  where  <r^  :=  y  and  op  is  defined  by  (3.3)  with  the  autocovari¬ 

ance  function  reT  replaced  by  r™.  It  is  not  difficult  to  verify  that  o2m  — *  a2  as  m  — *  oo. 
This  implies  that  Sm  N(0,  a2).  Moreover,  a  straightforward  calculation  shows  that 

n  2 

lim  limsup  n~1E  y^(Cm  —  C<)  =0. 

m“*  °°  n—*oo  “ 

Using  Chebychev’s  inequality,  we  obtain 

>  =  0 

for  any  e  >  0.  The  proof  is  then  completed  by  applying  Proposition  6.3.9  (Brockwell 
and  Davis,  1987).  <)• 

With  the  help  of  Theorem  3.1,  we  now  claim  the  asymptotic  normality  of  the 
sample  autocovariances  fj  in  (3.1). 

Theorem  3.2  Assume  that  E(£f)  —  ko^  <  oo.  Then,  nl!2(fj  —  rf),  (j  =  0,1,.. . ,p ), 
are  asymptotically  jointly  normal  with  mean  zero  and  covariance  matrix  [cr,^],  (i,  j  = 
0,1,.. .  ,p ),  where  <7ij  is  defined  by  (3.3). 

Proof.  Since  the  asymptotic  normality  has  been  proved  in  Theorem  3.1  for  fj,  it 
suffices  to  show  that  n1^2(fJ-  —  fj)  —  oP(  1)  for  j  =  1, . . .  ,p.  To  this  end,  we  note  that 

n 

n  1  ( fj  fj)  n  ^  ^  ' j  T  ^ t -\-j * t  T  4-  £t+i/£/)*  (3.28) 

t=n-j  + 1 


lim  limsup  pr  Lr1/2  £(C  -  Ct) 


47 


Since  \xt\  <  (3  :=  Y) At  for  all  t,  the  first  term  in  (3.28)  is  o(l).  For  the  same  reason, 

n  n 

E  Y  xt+jet  <  fi  E\et\  <  pf3E\e0\  <  oo. 

t=n-j+ 1  t=n-j+ 1 

By  Markov’s  inequality,  this  implies  that  the  second  term  in  (3.28)  is  oP(  1).  The  same 
conclusion  applies  to  the  third  term  in  (3.28).  Finally,  since 

n 

E  Y  e‘+ie<  -  iE\eM  <  pre0  <  OO, 

t=n-j+l 

the  last  term  in  (3.28)  is  also  Op(l).  The  theorem  is  thus  proved.  <) 

As  a  direct  consequence  of  Theorem  3.2,  the  asymptotic  normality  of  the  sample 
autocorrelation  pj  :=  fj/f0  can  also  be  established  as  follows. 

Corollary  3.1  Suppose  that  the  conditions  in  Theorem  3.2  are  satisfied.  Then,  for 
any  fixed  j  >  1,  n^2(pj  —  pj)  is  asymptotically  normal  with  mean  zero  and  variance 
Vj,  where  pj  := 

Vj  :=  (p]a00  -  2pjo0j  +  aj;)/(rg)2,  (3.29) 

and  (jjj  is  given  by  (3.3). 

Proof.  The  assertion  follows  immediately  from  Theorem  3.1  and  the  “delta  method” 
(Brockwell  and  Davis,  1987,  Proposition  6.4.3).  <0> 

We  would  like  to  end  this  section  by  making  the  following  remarks. 

Remark  3.2  The  asymptotic  normality  of  SACF  has  been  proved  by  Mackisack 
and  Poskitt  (1989)  for  the  simplest  case  where  q  —  1  and  {et}  is  an  i.i.d.  random 
sequence  (a  single  sinusoid  in  white  noise).  Therefore,  Theorem  3.1  and  Theorem  3.2 
extend  this  result  to  the  case  of  multiple  sinusoids  in  colored  noise. 

Remark  3.3  As  we  remarked  earlier,  the  proof  of  Theorem  3.1  and  Theorem  3.2 
is  based  upon  the  assumption  that  the  phases  <f>k  are  constants.  Since  the  asymptotic 


48 


distribution  of  fj  and  fj  does  not  depend  on  4>k,  the  same  conclusion  also  holds  for 
random  phases. 

Remark  3. 4  The  proof  of  Theorem  3.2  indicates  that  the  same  central  limit  theorem 
holds  for  any  sample  autocovariances  of  the  form 

n+v 

f»  =  n_1  Y  y*»y* 

t—ti 

where  the  integers  u  =  u(j)  and  v  =  v(j)  are  independent  of  n. 

3.2  CLT  for  SACF  from  Filtered  Process 

In  this  section,  we  consider  the  sample  autocovariance  function  from  a  filtered  time 
series.  We  show  that  the  asymptotic  normality  remains  valid  after  filtering,  provided 
that  the  filter  is  strictly  stable.  The  definition  of  a  strictly  stable  filter  is  as  follows 
(Ljung,  1987). 

Definition  3.1  A  linear  time-invariant  causal  filter  {hj},  ( j  =  0, 1, . . .),  is  said  to  be 
strictly  stable  if  3 1  hj  I  <  00 • 

Suppose  that  { hj }  is  a  strictly  stable  filter,  and  denote  its  transfer  function  by 

OO 

A  (")  ==  £  V"'" 

;=  0 

For  a  given  time  series  {3/1 , . . .  ,  yn}  from  (1.1),  define  the  filtered  time  series  by 

«-i 

iit(h)  :=Yhjy*-i  (f=l,...,n). 

:=o 

The  sample  autocovariances  of  the  filtered  times  series  are  defined  by 

n-j 

fjity  —  n-'Y^yt+jWytih)  (j  =  0,1,...  ,p).  (3.30) 

t=i 

Note  that  the  filtered  time  series  {y\(h), . . .,  yn(h)}  depends  completely  on  the  given 
data  record  {jfa, . . . ,  yn}.  For  convenience,  we  introduce  {yt(h)}  where 

OO 

yt(h)  :=Yhjy*-j- 

i= o 


49 


It  is  clear  that  yt{h )  requires  the  entire  history  of  the  process  {yt}  up  to  time  t.  From 
(1.1),  we  can  rewrite  {f/((h)}  as 

Vt{h)  =  ^2  ^(h)  cos(ukt  +  <f>k(h))  +  et(h)  (3.31) 

it=i 

where  fik{h)  :=  fik\H(uik)\  and  <f>k(h)  :=  <j)k  +  arg{ff(u4)}  are  the  amplitudes  and 
phases  of  the  filtered  sinusoids,  and  {o(h)}  is  the  filtered  noise  as  specified  by 

oo  oo 

Cj(/i)  ^  hj€t-j  —  Vjfct-j 
j=0  j  =  -co 

where  pj  :=  ipuhj_u.  Clearly,  the  filtered  signal  remains  a  sum  of  q  sinusoids  with 
the  same  frequencies,  while  the  filtered  noise  {^(h)}  is  still  a  linear  process. 

Let  ry(h)  and  reT(h )  denote  the  autocovariance  functions  of  { yt(h )}  and  {et(/i)}, 
respectively.  Then,  we  have  the  following  theorem  regarding  the  asymptotic  normality 
of  the  sample  autocovariances  fj(h)  in  (3.30). 

Theorem  3.3  Suppose  that  the  filter  {hj}  is  strictly  stable,  and  that  =  ncr^  < 

oo.  Then,  as  n  tends  to  infinity,  n1/,2(fj(h)  —  rj(h)),  (j  =  0,1,..., p),  are  asymptoti¬ 
cally  jointly  normal  with  mean  zero  and  covariance  matrix  [<t,j(/i)],  (i,  j  =  0, 1, . .  .,p), 
where  Oijfih)  can  be  represented  by  (3.3),  except  that  the  fik  and  r\  are  replaced  by 
/3k(h)  and  rlT(h),  respectively. 

Proof.  Because  of  the  representation  (3.31)  and  Theorem  3.2,  the  asymptotic  nor¬ 
mality  can  be  established  for 

n-j 

fj(h)  :=  n-l^yt+j{h)yt(h)  ( j  =  0,1,..., p). 

t- 1 

Therefore,  it  suffices  to  show  that  nl^2(fj(h)  -  fj{h))  =  oP{  1).  Let  us  define 

OO 

Vt{h)  := 

U-t 

Since  yt(h)  =  yt(h)  +  yt(h),  a  simple  calculation  shows  that 

n1/2(fj(h)  -  ffih ))  =  n"1/2  £]{&+;(/* )  yt(h )  +  yt+j(h )  yt(h )  +  yt+j{h)  yt{h )} 

I\  T  I'i  F3. 


50 


Using  (3.31),  we  can  write  Ix  as 

n-j 

h  =  n~l/2  Y,(*t+iZt  +  xt+j€t  +  xt+jit  +  it+jit) 

t= 1 

:=  T1!  +  T2  +  T3  +  T4 

where  xt,  et,  xt,  and  et  are  similarly  defined  (with  the  argument  h  being  omitted  for 
brevity)  as  yt(h )  and  yt(h).  By  definition,  we  have 

Ti  =  n~l/ 2Y^xt+jxt 
t= 1 

n-j  t+j-l  oo 

—  ti  ^  ^  ^  ^  ^  ^JhuhvXf^.j_uXj_v. 

t=l  u=0  v=t 

Since  \xt\  is  bounded  by  ft  =  ^2 /3k  and  H  :=  l^ul  is  finite,  it  follows  that 

n  co  oo 

|Tii  <  n-^wEE  IM  <  «'1/2^2E#»  I 

(=1 V=t  W=1 

and  hence  T\  =  o(l).  It  can  also  be  shown  from  (3.31)  that 

OO 

E\T2\  <  n-1/2Hf5E\€0\Y^v\hv\. 

V  =  1 

Applying  Markov’s  inequality  yields  T2  =  oP(  1).  In  a  similar  way,  we  can  show  that 
Tz  —  Op ( 1 )  and  TA  —  op(l).  Combining  these  results  gives  I\  —  <>p{l).  The  same 
results  can  be  obtained  for  /2  and  I3  by  an  analogous  argument.  O' 

Remark  3.5  The  asymptotic  normality  of  the  sample  autocorrelation 

Pj{h)  :=  rj(h)/r0(h ) 

can  be  obtained  in  the  same  way  as  Corollary  3.1,  except  that  pj  and  <7,j,  and  r%  are 
replaced  by  Pj(h)  :=  r-j(h)/rv0(h),  aij(h),  and  rl(h),  respectively. 

Remark  3.6  As  claimed  in  Remark  3.4,  the  same  central  limit  theorem  holds  for 
any  sample  autocovariances  of  the  form 

n+t> 

fj(h)  =  n-1E^+;(/l)f/*(/i) 

t=U 

where  the  integers  u  =  u(j )  and  v  =  v(  j)  are  independent  of  n. 


51 


3.3  Uniform  Strong  Consistency  of  SACF 


In  this  section,  we  consider  the  uniform  consistency  of  SACF  after  parametric  filtering. 
More  precisely,  let  {hj  (a)}  ,  (j  =  0, 1, . . . ),  be  a  parametric  causal  linear  time-invariant 
filter,  where  a  is  a  parameter  (possibly  a  vector)  that  takes  on  values  in  A.  Following 
the  notation  in  Section  3.2,  we  define  the  filtered  time  series  by 

t- i 

&(<*)  :=J2hi(a)yt-j  (/  =  1, . . ., n)  (3.32) 

i= o 

and  the  sample  autocovariances  of  the  filtered  time  series  by 

n—T 

rT(ot)  n~l  yt+T(a)  yt(oi).  (3.33) 

1=1 

We  would  like  to  show  that  under  certain  conditions  fT(a )  converges  to  ry(a)  almost 
surely  as  n  -+  oo,  and  uniformly  in  a  €  A,  where  ry(a)  is  the  autocovariance  function 
of  the  process 

OO 

Vt(a)  :=$>,(<*)  J/i-;-  (3-34) 

3=0 

Recall  that  the  noise  {€(}  is  a  linear  process  as  defined  in  (1.2).  In  addition,  we 
assume  that  the  filter  {/ij(a)}  is  uniformly  strictly  stable  according  to  the  following 
definition. 

Definition  3.2  A  causal  linear  filter  {hj(a)},  ( j  =  0,1,...),  is  said  to  be  uniformly 
strictly  stable  if  there  exist  constants  Cj  >  0  such  that  J2j°j  <  00  and  |h,(oOI  <  Cj  for 
all  j  =  0, 1, ... ,  and  uniformly  for  all  a  €  A. 

We  first  provide  a  general  result  concerning  the  uniform  strong  consistency  of  the 
sample  cross-covariances  between  two  differently  filtered  time  series. 

Theorem  3.4  Let  {hj(a)}  and  {gj(oc)}  be  uniformly  strictly  stable  filters  with  transfer 
functions  H(oj]a)  and  G(u>',a),  respectively.  Let  F(u)  be  the  spectral  distribution 


52 


function  of  {et}.  Then,  uniformly  in  a  E  A, 

n  /t+T-1  \  /i-1 

Y  hj (**)  yt+T-j  ]  £&(<*)  sm 

<  =  1  \  j=0  j  \j=0 

a-4-  Y  \fi\  a)  G(uk;  a)  e™' }  +  f  H(u-a)aJ^a)eiT“dF(u) 

k=i 

as  n  — *  oo  for  any  r  >  0,  where  §?{•}  stands  for  the  real  part  and  the  overbar  for  the 
complex  conjugate  of  a  complex  number. 

Before  we  prove  this  theorem,  let  us  introduce  the  following  lemma  concerning  the 
consistency  of  the  sample  covariance  of  {a:t}  and  {e<}. 

Lemma  3.5  For  any  fixed  u,  v,  and  w,  it  is  true  that 

n  n 

n~l  Y  *<-«*<-»  ^  K-u  and  n~l  Y  tt-uCt-v  a-*'  K_u 

t  —  W  t—W 

as  n  —>  oo,  where  r *  and  rcT  are  the  autocovariance  function  of  {xt}  and  {et}>  respec¬ 
tively.  It  is  also  true  that 

n 

n~1  Y  €*-uxt-v  °-+  0 

t=W 

for  any  fixed  u,  v,  and  w. 

Proof.  The  first  limit  can  be  proved  upon  using  the  trigonometric  identity  (1.14)  in 
connection  with  the  fact  that 

n 

lim  n~l  Y"'  cos(wjfc(2f  —  u  —  v)  +  2 <fo)  =  0  a.s. 

n— ►oo  z ' 

t~w 

for  any  k  and 

n 

lim  n"1  Y  cos (uk(t  -  u)  +  <j> k)  cos (uk>(t  -v)  +  <f>k>)  =  0  a.s. 

t—W 

for  any  k  f  k' .  The  second  limit  in  the  lemma  is  due  to  the  strong  ergodicity  of  {e(} 
(Hannan,  1970,  pp.  203-204;  Karlin  and  Taylor,  1975).  Finally,  to  prove  the  last  limit 
in  the  lemma,  we  note  that 

n  q  (  n  n 

Ye i-uZt-v  <YPk\  Y€t~uCos(ujkt)  +  Y  sm(ukt) 

x-w  *:-!  (  t-w  t=w 


53 


The  assertion  follows  from  the  fact  that  (An,  et  al.,  1983) 
lira  n~l 

n—>oo 

for  any  u,  u,  and  w.  <) 

Equipped  with  this  lemma,  we  now  prove  Theorem  3.4.  Throughout  the  proof  we 
drop  out  the  argument  a  for  brevity. 

Proof  of  Theorem  3.4.  For  any  sequence  {u>t},  let  us  define  vif  :=  wt  for  t  >  0 
and  wf  0  for  t  <  0.  Then,  it  is  readily  shown  that 

t+r-i  \  /t-i  \ 

E  hjVt+T-j  1  J  E SjUt-j  1  =  Ii(t )  +  h(t)  +  h(t)  +  h (t) 

j= o  j  \i= o  / 

where 


E  cos (wt) 


t~W 


—  lim  n 


-l 


E  sin(wJ) 


t=W 


=  0 


a.s. 


E  ^9vXt+T-UXt-V  ^2(0  E  ^'udv^t  +  T-u'- 


u,  v-0 


u.  V—O 


-^(O  yi  huSvtt-vXt+r-u  an^  A(f)  y]  hugve^+T_u€^_v. 

u,  v=0  u,  v=0 

For  any  fixed  N  >  0,  we  break  up  the  double  sum  for  u  and  v  into  four  terms  and 
obtain 

n  n  /N—1N  —  1  oo  N  —  1 


5>«  =  I  EE  +  EE 


t  =  l 


t=l  \u=0  v=0  u=N  v=0 
N— 1  oo  oo  oo 


+  EE  +  EE  Kgvx++T_ux+_v 


u~ 0  v—N  uzzNvzzN/ 

■=  ux  +  uz  +  uz  +  uz. 


(3.35) 


(Here  the  index  n  is  omitted  in  the  f7’s  for  brevity.)  Given  u  and  v,  let  w  —  w(u,  v )  := 
max(w  —  r,  v)  +  1.  Then,  according  to  Lemma  3.5,  we  obtain 

n  n 

—  1  x  ^  i  -4"  — 1  X  ^  n.s.  x 

n  /^Xt+T-uXt-v  ~  n  2^Xt+*-»Xt-v  G-u+u 

t—\  t—W 

as  n  — »  oo  for  any  0  <  u,  v  <  N  —  1.  Moreover,  let  chn  and  c9  be  the  constants  in 


54 


Definition  3.2  associated  with  {hu}  and  {</„}.  Then,  it  is  readily  shown  that 

N- 1  N-l 

n_1?7u  -  J2  KgvrxT_u+v 

u= 0  v=0 

N—l  N—l  n 

u=0  0=0  t=l 

for  any  a  £  A.  Therefore,  we  obtain 

N- 1  N-l 

hm  n~lU"  -Y.Y.  hugvrxT_u+v  a.s. 

n—*oo  1 

u=0  t/=0 

uniformly  in  a  G  A  for  any  N .  This  result,  in  coupled  with  the  uniform  stability  of 
the  filters,  implies  that  uniformly  in  a  G  A, 

Q  00  oo 

=  E^SX^^os  (w*(r-tt  +  t;)) 

fcrrl  u=0v=0 

q  (  oo  oo  ^ 

fc  =  l  Lu=0t)=0  J 

i  =  l 

In  addition,  since  |a;t|  <  fj  almost  surely,  we  have 

OO 

n-'lt'ni  <  PC,  £  cj 

u=N 

for  all  a  G  ^4  and  all  n,  where  Cs  :=  c£.  Therefore,  it  follows  that 

lim  lim  n~1  sup  It/^l  =  0  a.s. 

N— »oo  n—*oo  agi 

The  same  result  can  be  proved  for  and  U^4  by  similar  arguments.  Combining  these 
results  yields 

n-1  J2  h(t)  a-4'  \Pl  X{H(uk;  a)  G(wt;  a)  e™' }  (3.36) 

t- 1  fc=rl 

uniformly  in  a  G  A  as  n  — >  oo. 

Now  let  us  consider  J2(t).  Using  the  same  technique  as  in  (3.35),  we  write 

n  n  oo  oo 

J2W  =  J2J2J2h^vXt+r-u4-v 

t=l  i=lu=0t>=0 

=  U»  +  u»  +  u”  +  u». 


55 


For  any  a  €  A,  it  is  easy  to  verify  that 

N- 1 N- 1 

H<EE* 

u=0  v=0 

Moreover,  for  any  0  <  u,  v  <  N  —  1,  Lemma  3.5  guarantees  that 


n-1 

n 

^t+T-u^t- v 

=  n~1 

n 

Y.  Ct+T-UXt-V 

t=i 

t=w 

as  n  — ►  oo.  Therefore,  we  obtain 


E 


^t+T-U^t-V 


lim  lim  n  1  sup  If/i'fl  =  0  a.s. 

N-+  oon-oo 


Furthermore,  let  us  define  the  random  process 


Ov(t)  := \et+r-„\  (3.37) 

u=N 


for  any  fixed  N.  Since  £jy(t)  >  0  and,  by  monotone  convergence  theorem, 


OO 

£{6v(<)}  =  0C,E\€ 0|  £  c«  <  oo, 

u=N 


the  infinite  sum  in  (3.37)  converges  almost  surely  so  that  Gv(0  is  well-defined.  More¬ 


over,  it  is  easy  to  show  that 


ra  <  e«*) 

<=i 


for  all  a  €  A.  Since  {Ov(t}  is  strictly  stationary  for  each  fixed  N,  according  to  the 


strong  ergodic  theorem  (Karlin  and  Taylor,  1975),  there  exists  a  random  variable  (N 


such  that 


n 

n_1  51  Cjv(<)  =  Cn  a-s- 

n~*°°  t-l 

By  the  same  theorem,  the  expected  value  of  can  be  written  as 

oo 


E((n)  =  E{(n( 0)}  =  (3CgE\e0\  J2  chu 

u=N 


56 


for  any  fixed  N.  Since  (N  >  0  almost  surely  and 

oo  oo  oo  oo 

E  -BfC*)  =  PC>E  W  PC,E\(„\ E(»  +  1*4  <  oo 

N= 0  N=Ou=N  u=0 

it  is  readily  shown  by  using  Markov’s  inequality  that 

{OO  OO  oo 

(J  (Cjv  >  e)  >  <  Pr(Cw  >  e)  <  £-1  £(Civ)  -►  o 

N=N'  )  N=N'  N=N' 

as  N1  — ►  oo  for  any  e  >  0.  According  to  the  Borel-Cantelli  lemma,  this  implies  that 
pr{(Cv  >  £)  i.o.}  =  0  for  any  £  >  0  and  hence  that  (N  a-i'  0  as  TV  — ►  oo.  Consequently, 
we  obtain 


lim  limsup  n  1sup|C/’^|  =0  a.s.  (3.38) 

TV— ♦oo  n— roo  a£A 


By  a  similar  argument,  the  same  result  can  also  be  proved  for  U ^  and  U^4.  It  is 
therefore  concluded  that 


0  (3.39) 

t= i 


uniformly  in  «  €  A  as  n  — >  oo.  Upon  noting  the  similarity  between  I2(t)  and  /3(f),  it 
is  not  surprising  that  the  same  result  also  holds  for  J3(t),  namely, 


0  (3.40) 

1=1 


uniformly  in  a  €  A  as  n  — ►  oo. 


Finally  let  us  consider  hit)  for  which  we  can  write 


=  J2YlYlh^v4+r-u4- 

t  =  l  t~l  u=0  v  —0 

-  TJX  4.  TjN  _j_  TjN  ,  tjN 

—  vAl  u42  u43  u44 


in  the  same  way  as  (3.35).  Given  0  <  u,  v  <  N  —  1,  Lemma  3.5  implies  that 


n  1  EE  4+T-u€t-v  =  n  1  EE  ft+r-uU-v  ‘A  K_u+V 

t  =  1  t~U) 


57 


as  n  — ►  oo.  Moreover,  since 


N-l  N-l 


n 


~lK  -EE  h»9vK_ 


U+t) 


u= 0  u=0 

N-l  N-l 

<EE  <$<* 

u=0  v=0 


n 


-1  c+  _ 

Z_j  tt+T-Utt-V  '  T- 


i= 1 


we  obtain,  upon  noting  the  uniform  stability  of  the  filters, 


lim  lim  n  1UA1  =  EEA*r^-« 


N-+00  n— ►oo 


+v 


tt=0  v=0 


=  r  H{u-  a)  G(w;  a)  e,v"  dF(w) 
J  —  TT 


uniformly  in  a  £  A.  To  show  that  the  remaining  terms  in  /4(t)  are  negligible,  we 
observe  that  for  all  a  £  .4, 


Ml  <  Ez»(0 

«=i 


where  {Zjy(t)}  is  a  strictly  stationary  process  as  specified  by 


OO  OO 

M*)  •—  E  E!  CuCv\ei+T-u€t-v\- 

u=N  u=0 


The  infinite  sum  in  this  expression  converges  almost  surely,  since  ZN(t)  >  0  and 


E{ZN(t))  = 

XI=N  v=0 


<  <Vo  E  cn  <  00 

u=N 


according  to  the  monotone  convergence  theorem.  By  a  similar  argument  as  we  em¬ 


ployed  earlier  for  the  proof  of  (3.38),  we  obtain 


lim  lim  sup  n  1  sup  \UA2\  =  0  a.s. 

►oo  n— ►oo  cx(zA 


The  same  result  is  also  true  for  UA3  and  UAA.  Therefore,  we  conclude  that  uniformly 
in  a  €  A, 


n-1  E  hit)  ■ ^  /  H (w;  a)  <?(«;  a)  e1'™  dF(w)  (3.41) 

t- i 


58 


as  n  — *•  oo.  Notice  that  the  last  quantity  is  a  real  number  because  of  the  symmetry  of 
G(u;  a),  H (a>;  a),  and  F(u)  in  u.  Collecting  (3.36),  (3.39),  (3.40),  and  (3.41)  completes 
the  proof.  <) 

Remark  3.7  In  the  proof  of  Theorem  3.4,  we  assume  that  the  <f>k  are  random.  The 
same  result  remains  valid  if  the  (f>k  are  constants,  since  Lemma  3.5  holds  under  the 
assumption  of  constant  phases. 

As  a  direct  consequence  of  Theorem  3.4,  we  now  claim  the  uniform  strong  consis¬ 
tency  of  the  sample  autocovariances  rT(a)  in  (3.33). 

Theorem  3.5  Suppose  that  (hj(o)}  is  uniformly  strictly  stable.  Then ,  as  n  — >■  oo, 
fT(a )  a-A'  ry(a)  uniformly  in  a  £  A  for  any  t  >  0. 

PROOF.  It  follows  from  Theorem  3.4  with  {gj}  replaced  by  {hj}  and  n  by  n  —  r.  <0> 
The  uniform  strong  consistency  can  also  be  proved  for  the  autocovariances 

n+u 

rT(a)  =  n-1]£y«+r(a!)  »«(<*) 

t—u 

where  the  integer  u  and  v  are  independent  of  n. 

Corollary  3.2  If{hj(a)}  is  uniformly  strictly  stable,  then,  as  n  — ►  oo,  fT(a )  a-A  ryT(a) 
uniformly  in  a  £  A  for  any  r  >  0. 

Proof.  According  to  Theorem  3.4,  it  suffices  to  show  that 

u  —  1 

ra_1 2  y*+r(a)  &(a) 

t=l 

vanishes  almost  surely  and  uniformly  in  a  £  A.  (Here  we  assume  u  >  1  without  loss 
of  generality.)  To  this  end,  let  {c*}  be  the  constant  sequence  associated  with  {/iu(«)} 
in  Definition  3.2,  and  let  Ch  '■=  Y^cu-  Then,  it  is  easy  to  show  that 

ti  —  1  ti  —  1 

n-1  £  I yt+r(a)  yt(a) \  <  n"1  £{/?2Cft2  +  0  (Wi+r  +  IT,)  +  Wi+TWt} 

t= i  t=i 


59 


for  all  a  €  A.  The  assertion  follows  immediately  since  the  sum  on  the  right-hand  side 
does  not  depend  on  n. 

Remark  3.8  For  the  original  (unfiltered)  process  {yt}  in  (1.1),  we  have 

n+v 

-1  *  -  as-  v 

rT  =  n  2L,yt+rVt  -»  n 

t=U 

for  any  fixed  r,  u,  and  v.  This  corresponds  to  the  trivial  case  in  Corollary  3.2  where 
h0(a)  =  1  and  hj(a )  =  0  for  all  j  >  0. 

Suppose  that  the  filter  {/ij(c*)}  in  (3.32)  is  differentiable  with  respect  to  a  and 
the  derivative  is  also  uniformly  strictly  stable.  The  following  theorem  claims  that  the 
uniform  strong  consistency  remains  for  the  derivative  of  fr(a). 

Theorem  3.6  Suppose  that  {hj(a)}  is  differentiable  with  respect  to  a,  and  that  both 
{hj(a)}  and  its  derivative  are  uniformly  strictly  stable.  Then,  in  addition  to  the 
uniform  strong  consistency  of  rT(a )  as  claimed  in  Theorem  3.5,  it  is  also  true  that 
drT(a)/da  ^4  drl(a)/da  uniformly  in  a  €  A  as  n  -+  oo. 


Proof.  From  (3.32)  and  (3.33),  it  is  easy  to  verify  that 


and 


=  n  XE  |  (  J2hi(a)yt-i 


t=l  \j= o 
n—T  /  t+r 


\j=  0 


+ n  1 E  E^O*)  vt+r-i  E^(a)» 


*= i  V'=o 


t-j 


u=0 


=  E  ¥1  +  HkH'k)  eiTUk } 

oa  *=1 

+  r  {H'H  +  HH')  eiTW  dF{u) 

J  —  7T 

where  Hk,  H,  Hk,  and  H'  are  short-hand  notation  of  H(tok-,a),  H(u;a),  H'(ujk;a), 
and  H'(u>;a),  respectively,  and  the  prime  stands  for  the  differentiation  with  respect 
to  a.  The  proof  is  completed  by  applying  Theorem  3.4  to  these  quantities  followed  by 
an  argument  similar  to  the  proof  of  Theorem  3.5. 


60 


As  a  corollary,  we  obtain  the  uniform  strong  consistency  of  the  sample  autocorre¬ 
lation  pT(a)  rT(a)/f0(a)  and  its  derivative  dpT(a)/da  as  follows. 

Corollary  3.3  Under  the  conditions  in  Theorem  3.6,  if  ryQ(a)  >  0  for  all  a  G  A,  then, 
as  n  — *•  oo,  pT( a)  a-->  />?(«)  and  dpT(a)/da  a-A'  dpy(a)/da  uniformly  in  a  £  A,  where 
py(a )  :=  ry(a)/ry(a)  is  the  autocorrelation  function  of  {yt(a)}. 

PROOF.  Since  r o(a)  >  0  for  all  a  €  A,  the  uniform  strong  consistency  in  Theorem  3.5 
implies  that  r0(a )  >  0  almost  surely  for  all  a  £  A,  provided  that  n  is  sufficiently  large. 
Therefore,  pT(a)  is  well-defined  and  differentiable  for  all  a  €  A  and  large  n.  Moreover, 
we  have 

dpT(a)  _  f0(a )  drT{ot)/da  -  fT(a)  df0(a)/d a 
da  (^o(o))2 

The  assertion  follows  immediately  from  Theorem  3.5  and  Theorem  3.6.  O’ 

3.4  More  Results  on  Uniform  Strong  Consistency 

The  uniform  strong  consistency  results  in  Theorem  3.4  and  Theorem  3.5  can  be  ex¬ 
tended  to  include  some  more  general  signal-plus-noise  models. 

Let  us  consider  {yt{a)}  which  obeys  the  following  parametric  model 

OO  CO 

yt(a)  =  ^2hi(a)xt-j  +  J2Si(a)ct-i  (t  =  0,  ±1,  ±2, . . .)  (3.42) 

o  j= o 

where  {xt}  and  {et}  are  zero-mean,  strictly  stationary,  and  mutually  independent, 
with  finite  second  moments.  As  we  can  see  from  the  proof  of  (3.39),  (3.41),  and 
Theorem  3.5,  in  order  to  show  that  the  sample  autocovariance  function  of  {yt(a)} 
converges  to 

ryT{a)  :=  E{yt+T(a)  yt(a)} 

oo  oo 

=  2  M«)M«)rr-u+v  +  Y,  9u(a) gv(a)  reT_u+v  (3.43) 

u,  0  u,  v=0 


61 


almost  surely  and  uniformly  in  a,  it  is  sufficient  to  require  that  the  almost  sure  limits  in 
Lemma  3.5  exist  for  any  u ,  v,  and  w,  and  that  both  (hj(a)}  and  {<?j(aO}  be  uniformly 
strictly  stable2.  Therefore,  we  immediately  obtain  the  following  results. 

Theorem  3.7  Let  the  parametric  process  {yt(o:)}  be  defined  by  (3.42)  for  a  £  A, 
with  {x(}  and  {et}  being  any  zero-mean,  strictly  stationary,  and  mutually  independent 
processes  for  which  Lemma  3.5  holds.  If  {hfia)}  and  {<7j(o)}  are  uniformly  strictly 
stable  in  A,  then 

n_1  J2  yt+T(a)  yt(a)  a-A'  ryT(a ) 
t= i 

uniformly  in  a  £  A  as  n  — ►  oo  for  any  r  >  0,  where  rf{a)  is  given  by  (3.43). 

The  uniform  consistency  in  Theorem  3.7  is  of  importance  in  applications  where 
the  inference  of  the  model  parameter  a  relies  on  the  sample  autocovariance  function 
of  the  observed  process  {yt(a)}.  Generalization  of  this  result  to  higher-order  sample 
moments  and  cumulants  is  quite  straightforward.  All  we  need  is  to  assume  that  the 
corresponding  higher-order  sample  moments  of  {xt}  and  {et}  are  strongly  consistent, 
giving  rise  to  similar  results  as  in  Lemma  3.5.  We  have  noticed  a  recent  attempt3  to 
prove  the  uniform  strong  consistency  of  sample  cumulants  from  a  process  similar  to 
{yt(a)}.  Unfortunately,  the  proof  was  based  on  a  false  lemma  which  was  incorrectly 
cited  from  a  wrong  result  of  Ljung  (1987,  Appendix  2B). 

Under  the  same  assumption  about  { xt }  and  {et}  as  in  Theorem  3.7,  we  can  also 
prove  the  uniform  strong  consistency  for  the  process  yt  —  xt  +  et  after  parametric 
filtering.  Define,  as  before,  the  filtered  data  by 

t  —  1  OO 

yt(<x)  :=  Yl  hi(a)  y*-i  -  Y  hAa)  ( xt-j  +  4-j)-  (3-44) 

j=0  j= 0 

2The  proof  goes  through  with  xt  and  et  in  place  of  xf  and  ef,  respectively. 

3Giannakis,  G.  B.  and  Tsatsanis,  M.  K.  (1992).  A  unifying  maximum-likelihood  view  of  cumulant 
and  polyspectral  measures  for  non-Gaussian  signal  classification  and  estimation.  IEEE  Trans.  Inform. 

Theory,  vol.  38,  no.  2,  part  I,  pp.  386—406. 


62 


Following  the  same  proof  of  (3.39),  (3.41),  and  Theorem  3.5,  we  obtain  the  following 
results. 

Theorem  3.8  Suppose  that  the  conditions  about  { xt }  and  {et}  in  Theorem  3.7  are 
satisfied,  and  let  { t/t ( ck ) }  be  defined  by  (3.44).  If  the  filter  { hj (a)}  is  uniformly  strictly 
stable  in  A,  then,  as  n  — ►  oo, 

n_1 2  y*+r(a)  Vt(a)  r?(«) 

t  =  l 

uniformly  in  a  6  A  for  any  r  >  0,  where  ryT(a)  is  the  autocovariance  function  in  (3.43) 
with  Qj  replaced  by  hj . 

Remark  3.9  The  assumptions  in  Theorem  3.7  are  clearly  satisfied  if  {a;t}  and  {et} 
are  both  linear  processes  (Hannan,  1970,  p.  204). 


63 


Chapter  4 


Frequency  Estimation  by  Parametric  Filtering 


As  we  have  seen  from  Chapter  2,  Prony’s  estimator  leads  to  inconsistent  frequency 
estimates.  To  alleviate  this  predicament,  we  propose  in  this  chapter  a  general  proce¬ 
dure  of  iterative  parametric  filtering  —  the  parametric  filtering  (PF)  method  —  that 
can  be  shown  to  produce  consistent  frequency  estimates.  The  idea  is  to  judiciously 
parametrize  the  filter  so  that  it  satisfies  a  certain  parametrization  property  for  all  pa¬ 
rameters  in  a  neighborhood  of  the  true  AR  parameter  a,  and  the  clue  for  the  correct 
parametrization  comes  from  the  particular  form  of  the  bias  encountered  by  Prony’s 
estimator.  For  any  filter  which  satisfies  the  parametrization  property,  we  define  the  PF 
estimator  of  the  AR  parameter  a  as  the  multivariate  fixed-point  of  the  parametrized 
least  squares  estimator  from  the  filtered  data.  Under  certain  mild  conditions,  the  least 
squares  estimator,  as  a  function  of  the  filter  parameter,  is  shown  to  constitute  a  con¬ 
tractive  mapping  so  that  the  PF  estimator  exists  as  its  fixed-point.  We  also  consider 
an  iterative  algorithm  that  calculates  the  PF  estimator  and  show  its  almost  sure  con¬ 
vergence  to  the  PF  estimator.  The  fact  of  the  matter  is  that  the  PF  estimator  can 
be  shown  to  be  strongly  consistent  and  asymptotically  normal  for  estimating  the  AR 
parameter  a.  These  results  solidify  the  theoretical  foundation  of  the  PF  method  as  a 
promising  procedure  of  frequency  estimation. 


64 


4.1  Parametric  Filtering  Method 


Our  basic  idea  of  eliminating  the  bias  in  Prony’s  estimator  rests  upon  estimating  the 
AR  parameter  a  in  (2.4),  not  directly  from  the  original  data  as  done  in  Chapter  2,  but 
from  the  filtered  data  —  upon  using  an  appropriate  parametric  filter.  As  we  shall  see, 
the  least  squares  estimator  from  the  filtered  data  can  be  made  consistent  by  iteratively 
“tuning”  the  filter  parameter  according  to  previously  obtained  AR  estimate. 

4.1.1  AR  Estimation  After  Parametric  Filtering 

Let  us  consider  a  linear  time-invariant  causal  filter  with  impulse  response  sequence 
{hj(a),j  =  0,1,..  .},  where  a  :=  [cti,*-«  ,aq]T  is  the  filter  parameter  which  takes 
on  values  in  an  open  set  A  that  contains  the  AR  parameter  a.  Assume  that  for  any 
a  6  A  the  filter  is  stable,  i.e.,  |A,(o!)|  <  oo.  In  addition,  suppose  that  the  filter 

passes  all  of  the  sinusoidal  components  in  {x<},  namely, 

(Al)  H(uk;ac)  ±  0  for  k  —  1, . . . ,  q  and  for  all  a  G  A, 

where  H(u>;a)  is  the  transfer  function  of  the  filter.  We  apply  this  filter  to  {yt}  and 
obtain  the  filtered  process 

OO 

»t(«)  :=J2hi(a)yt-r 

i=o 

Similarly,  we  define  the  filtered  signal  {^(c*)}  and  the  filtered  noise  {ct(o:)}. 

The  key  fact  to  observe  is  that  the  filtered  signal  (x<(a)}  remains  a  sum  of  q 
sinusoids  with  the  same  frequencies,  and  hence  a  solution  to  the  same  homogeneous 
AR  equation  (2.3).  In  fact,  according  to  the  theory  of  linear  filtering  (Brockwell  and 
Davis,  1987),  the  filtered  signal  (a;t(a)}  can  be  expressed  as 

*t(«)  =  cos(w*f  +  &(«)) 

k= 1 

where  /?*(«)  :=  fik\H (u)k\ a)\  and  ^(a)  :=  <f)k  +  a,Tg{H(ujk;  a)}  are  the  amplitudes  and 
phases  of  the  filtered  sinusoids.  Since  the  AR  equation  (2.3)  is  completely  determined 


65 


by  the  unchanged  frequencies  u>k,  we  still  obtain  A(z~l)  xt(ct)  =  0  for  all  t.  It  is  quite 
clear  that  the  original  problem  of  frequency  estimation  is  not  altered  by  the  linear 
filtering  and  can  be  equivalently  stated  as  that  of  estimating  the  same  AR  parameter 
a  in  terms  of  the  filtered  processes. 

Let  aLs(<*)  be  the  least  squares  estimator  of  a  on  the  basis  of  {yi(ot), . . . ,  yn(a.)}. 
Since  {?/<(«)}  still  consists  of  a  sinusoidal  signal  in  additive  noise  that  can  be  modeled 
as  a  linear  process,  the  strong  consistency  remains  for  {i/t(e*)}  as  indicated  by  the  large 
sample  theory  in  Chapter  3  (see  Remark  3.9  with  yt  replaced  by  yt(a)).  Therefore, 
similarly  to  (2.11),  the  least  squares  estimator  aLs(c*)  converges  almost  surely  to  the 
deterministic  vector 

a(a)  := -R;1(a)r!/(a)  (4.1) 

which,  as  claimed  in  Lemma  2.2,  can  be  written  as 

a(a)  =  a  -  R'^a)  {Re(a)  a  +  r£(a)}  (4.2) 

where  the  autocovariances  are  defined  from  the  corresponding  filtered  processes. 

Since  Ry(«)  =  R^(ct)  -f  Re(a),  multiplying  each  side  of  (4.2)  by  Ry(a)  gives 

R*(o0  a(a)  +  Rf(a)  a(a)  =  RK(a)  a  -  r£(cx).  (4.3) 

Note  that  Rj(a)  is  nonsingular  if  all  of  the  q  sinusoids  are  retained  after  the  filtering. 
In  fact  Rx(a)  admits  a  similar  decomposition  as  in  Lemma  2.3  and  Remark  2.1  with  P 
in  (2.20)  replaced  by  P(a)  :=  \  diag(^(a), . .  .,filq{ct)).  Under  (Al),  |JT(wfc;a)|  >  0 
for  all  k.  It  follows  that  /?*(«)  >  0  for  all  k,  and  hence  P(a)  is  nonsingular.  A  similar 
argument  as  in  the  proof  of  Lemma  2.3  confirms  the  nonsingularity  of  R.c(a).  Armed 
with  this  fact  and  (4.3),  the  bias  a(a)  -  a  can  be  written  as 

a  (a)  -  a  =  -Rj^a)  (Rf(a)  a(a)  +  r£(a)}.  (4.4) 

Unlike  what  we  had  before  when  the  least  squares  (Prony’s)  estimator  was  obtained 
from  the  original  data,  the  asymptotic  bias  of  aLs(ci)  is  now  a  function  of  the  filter 


66 


parameter  a.  This,  as  we  shall  see,  makes  it  possible  to  fulfill  our  goal  of  eliminating 
the  bias  via  appropriate  parametric  filtering. 

4.1.2  PF  Method  of  Frequency  Estimation 

As  suggested  by  (4.4),  if  there  exists  a  filter  parameter  a*  in  A  such  that 

a(a‘)  =  -R71(a*)r£(a*),  (4.5) 

then  from  (4.4)  we  would  obtain  a(a*)  =  a.  This  implies  that  after  filtering  with 
c**,  the  least  squares  estimator  aLs(«*)  becomes  consistent  for  estimating  the  AR 
parameter  a. 

Suppose  now  that  the  autocorrelation  function  of  the  noise  {et}  is  known,  and,  for 
convenience,  that  the  filter  is  parametrized  so  that 

(A2)  a  =  — R”1^)  r€(o:)  for  all  a  €  A. 

In  this  case,  the  filter  is  said  to  possess  the  parametrization  property.  It  is  interesting  to 
note  the  similarity  between  (A2)  and  the  Yule- Walker  equations.  In  fact,  for  an  AR(2g) 
process  with  AR  coefficients  satisfying  the  symmetry  condition  (2.2),  the  vector  of  the 
first  q  free  coefficients  will  be  the  solution  of  (A2),  with  Rt(«)  and  rt(o:)  being  replaced 
by  their  counterparts  defined  from  the  autocovariances  of  that  AR  process. 

Equipped  with  the  parametrization  property,  we  have  the  following  theorem. 

Theorem  4.1  Suppose  that  the  filter  {/fy(a)}  satisfies  (Al)  and  (A2).  Then  at*  =  a 
is  the  unique  fixed-point  of  the  deterministic  mapping  a(a)  in  A. 

Proof.  From  (4.2)  and  (A2),  we  find  that 

a(ct)  —  a  =  C(cc)  (a  —  a)  (4-6) 

where 

C(«):=V(«)R*(a).  (4.7) 


67 


Since  C(cx)  is  nonsingular  and  a.  £  A,  the  AR  parameter  a  is  a  fixed-point  of  a(a).  On 
the  other  hand,  since  the  parametrization  property  (A2)  is  satisfied,  (4.5)  is  equivalent 
to  a(ct*)  =  at*.  This  implies  that  a*,  if  exists,  is  a  also  fixed-point  of  a(a).  It  also 
implies  that  at*  =  a  since  a(a:*)  =  a.  The  theorem  is  thus  proved.  <0 

According  to  this  theorem,  it  is  no  longer  necessary  to  distinguish  between  the 
AR  parameter  a  and  the  filter  parameter  a*,  and  thus  the  problem  of  estimating  a 
becomes  identical  to  that  of  estimating  at*.  Given  a  consistent  estimator  a(a)  of  the 
deterministic  mapping  a(a),  Theorem  4.1  gives  rise  to  the  idea  of  finding  a  fixed-point 
in  the  random  mapping  a(a)  as  an  estimator  of  the  AR  parameter  a.  We  refer  to  this 
method  as  the  parametric  filtering  (PF)  method ,  and  call  the  fixed-point,  denoted  by 
a,  the  PF  estimator  of  a.  The  corresponding  uk,  obtained  from  the  zeros  of  the  AR 
polynomial  A(z~l)  in  (2.1)  with  at  in  place  of  a,  are  called  the  PF  frequency  estimates. 

To  obtain  the  PF  estimator  d  which,  by  definition,  is  a  fixed-point  of  a(a),  we 
employ  the  so-called  fixed-point  iteration.  Namely,  starting  with  an  initial  guess  d0, 
we  recursively  calculate 


dm  =  a(dm_x)  (m  =  1,2, . . .).  (4.8) 

The  fixed-point  d  is  obtained,  when  convergence  occurs,  as  the  limiting  value  of  dm 
as  m-4  oo.  Similar  to  some  existing  methods  pertaining  to  special  cases  (Kay,  1984; 
Dragosevic  and  Stankovic,  1989),  this  procedure  is  an  iterative  filtering  algorithm 
which  can  be  implemented  as  follows:  given  the  initial  guess  d0  and  for  m  =  1,2,..., 

Step  1.  Filter  the  data  with  {hj(at)}  such  that  at  —  dm_x; 

STEP  2.  Compute  the  estimate  a(dm_x)  from  the  filtered  data; 

STEP  3.  Set  dm  =  a(dm_j);  and 

Step  4.  Go  to  Step  1  with  m  replaced  by  m  +  1. 


68 


Related  procedures  for  the  special  case  of  q  =  1  have  been  considered  by  He  and 
Kedem  (1989),  Yakowitz  (1991),  Kedem  and  Yakowitz  (1992).  See  also  Li  and  Kedem 
(1992),  and  Li,  Kedem,  and  Yakowitz  (1992)  for  a  somewhat  more  rigorous  treatment 
of  this  special  case. 

Remark  4-1  Suppose  for  the  moment  that  the  sinusoids  are  absent  in  {?/,}.  From 
(4.1),  it  is  readily  shown  that  a(a)  coincides  with  -R^fa)  re(a)  and  hence  becomes 
the  identity  mapping  a(cx)  =  a  under  (A2).  Therefore,  the  parametrization  property 
(A2)  simply  requires  that  each  and  every  a  in  A  be  a  fixed-point  of  a(a)  in  the 
absence  of  the  signal.  Moreover,  when  the  signal  is  present  but  the  filter  {/ij(«)}  fails 
to  capture  it,  we  still  obtain  a(a)  =  a.  since  {yt(o:)}  does  not  contain  a  sinusoidal  part. 
In  this  case,  little  or  no  change  is  expected  from  the  iteration  (4.8)  when  initiated  by 
a.  In  other  words,  when  the  signal  is  not  captured,  the  filter  essentially  remains  where 
it  started.  This  observation  suggests  that  the  presence  of  a  sinusoidal  signal  in  {yt} 
could  be  tested  based  on  whether  significant  changes  occur  after  the  iteration  (4.8)  for 
a  number  of  randomly  selected  initial  guesses. 

Remark  4-2  To  parametrize  the  filter  {hj (a)}  in  accordance  with  (A2)  requires 
the  autocorrelation  function  or  the  normalized  spectrum  of  the  noise  {e*}.  If  this 
information  is  available,  a  linear  filter  can  be  designed  to  whiten  the  noise  by  the 
classical  theory  of  Wiener  filtering.  A  filter  {/i;-(a)}  with  the  property  (A2)  can 
thus  be  obtained  by  cascading  the  whitening  filter  with  a  parametric  filter  (e.g.,  the 
AR  filter  in  Chapter  5)  which  satisfies  (A2)  when  the  noise  is  white.  In  the  cases 
where  the  autocorrelation  function  of  the  noise  is  not  available  but  can  be  modeled  as 
an  AR  or  MA  process,  certain  iterative  procedures  can  be  employed  to  estimate  the 
noise  parameters  and  the  sinusoidal  frequencies  alternatively  (see,  e.g.,  Dragosevic  and 
Stankovic,  1989).  Intuitively,  however,  the  performance  of  the  PF  method  should  not 
be  too  sensitive  to  the  noise  spectrum  if  bandpass  filters  are  applied  locally  to  frequency 
clusters  and  the  noise  spectrum  is  sufficiently  smooth.  For  instance,  if  three  sinusoids 


69 


are  present  with  two  closely  spaced  and  the  other  well  separated  from  the  first  two,  a 
bandpass  filter  with  q  =  1  (e.g.,  AR(2)  filter  in  Chapter  5)  can  be  applied  locally  to 
the  separated  frequency  while  another  bandpass  filter  with  q  —  2  (e.g.,  AR(4)  filter  in 
Chapter  5)  to  the  two  closely-spaced  frequencies. 

4.1.3  Least  Squares  Estimator 

So  far  we  have  not  yet  specified  the  estimator  a(a).  In  fact,  any  estimator  would 
qualify  as  long  as  it  converges  (stochastically)  to  a(cc)  when  the  sample  size  n  tends 
to  infinity.  In  the  following,  we  specialize  the  choice  of  a(a)  by  considering  the  least 
squares  estimator  from  the  filtered  data  where  the  filtering  is  completely  based  upon 
the  observations  jq  . 

Notice  that  given  the  finite  data  record  {f/i,. . . ,  yn},  the  filtered  process  {yt(a)} 
can  only  be  approximated  by 

t-i 

Vt(a)  :=]£fy («)»«-*  (t=l,...,n).  (4.9) 

i=o 

Therefore,  the  estimator  a(a)  should  be  defined  on  the  basis  of  {yi(a), . . . ,  yn(a)}  in 
stead  of  the  unavailable  data  {jq(a), . . . ,  y„(a)}.  Motivated  by  the  estimator  aLs(<*)? 
which  requires  the  unavailable  {yt(a)},  we  may  take 

a(a)  :=  -{QTYT(a)  Y(a)  Q}_1QTYT(a)  y(a)  (4.10) 

where  Y (a)  and  y (a)  are  the  data  matrices  defined  from  {yt(ct)}  in  the  same  way  as 
Y  and  y  in  (2.6).  It  is  readily  shown  that  a(a)  in  (4.10)  is  the  least  squares  estimator 
that  minimizes  the  criterion  |]y(a)  +  Y(a)  Qa(a)||2.  According  to  the  large  sample 
theory  developed  in  Chapter  3  (Theorem  3.5),  if  the  filter  is  uniformly  strictly  stable, 
the  least  squares  estimator  a(a)  in  (4.10)  converges  almost  surely  to  a(«)  as  n  tends 
to  infinity.  In  other  words,  a(a)  is  a  strongly  consistent  estimator  of  a(a).  The  strong 
consistency  is  also  uniform  in  a.  so  that  many  properties  of  a(«),  as  a  deterministic 


70 


function  of  a,  are  retained  by  its  estimator  a(c*).  This  observation  turns  out  to  be 
very  helpful  in  the  statistical  analysis  of  the  PF  estimator,  as  we  shall  consider  shortly. 

4.1.4  Relation  to  the  CM  Method 

The  PF  method  described  above  happens  to  be  an  extension  of  a  procedure  recently 
proposed  by  He  and  Kedem  (1989)  and  by  Yakowitz  (1991)  for  single  frequency  esti¬ 
mation.  A  more  systematic  and  rigorous  treatment  of  this  procedure  and  its  statistical 
properties  can  be  found  in  Li  and  Kedem  (1992),  and  Li,  Kedem,  and  Yakowitz  (1992), 
where  the  procedure  was  referred  to  as  the  CM  (or  contraction  mapping)  method.  In¬ 
deed,  for  q  —  1,  we  note  that  the  parametrization  property  (A2)  becomes 

a  =  -2r[(a)/re0(a)  (4.11) 

where  a  :=  al5  and  the  least  squares  estimator  in  (4.10)  reduces  to 

«(«)  =  -  J2  tit- 1(°0  {&(«)  +  tit- a(«)}  / £  Vt- i(°0-  (4.12) 

t=3  /  t—3 

If  we  reparametrize  the  filter  by  d  :=  -a/2,  then  (4.11)  can  be  written  as  =  pj(i?) 
where  p[{'0)  stands  for  the  first-order  autocorrelation  of  {et(a)}.  This  relation  is 
readily  recognized  as  being  the  fundamental  property  required  by  the  CM  method  for 
the  parametrization  of  the  filter.  Moreover,  as  we  have  seen  in  Chapter  3  (see  also  Li 
and  Kedem,  1992),  /$($)  :=  -a(a)/ 2  is  a  uniformly  and  strongly  consistent  estimator 
of  the  first-order  autocorrelation  of  (y*(a)}  if  the  filter  is  uniformly  strictly  stable. 
With  this  estimator,  the  fixed-point  iteration  in  (4.8)  becomes 

flm  =  1)  (m  =  1,2,...), 

which  coincides  with  the  iteration  of  the  CM  method  that  produces  a  sequence  {$m} 
for  estimating  the  parameter  •&*  :=  —a*  /2  =  coswi. 

Statistical  properties  of  the  CM  method  have  recently  been  studied  by  Li  and  Ke¬ 
dem  (1992)  and  Li,  Kedem,  and  Yakowitz  (1992).  In  these  works  (see  also  Chapter  5), 


71 


it  was  proved  that  under  appropriate  conditions  the  CM  method  provides  a  strongly 
consistent  estimator  of  uq,  and  that  the  estimator  is  asymptotically  normal  with  a 
variance  inversely  related  to  the  signal-to-noise  ratio  of  the  filtered  data.  In  the  next 
section,  we  shall  analyze  the  PF  method  along  the  same  lines  as  in  these  works  in  order 
to  establish  the  strong  consistency  and  asymptotic  normality  of  the  PF  estimator  a. 

4.2  Statistical  Properties  of  the  PF  Estimator 

To  investigate  statistical  properties  of  the  PF  method  presented  in  the  preceding  sec¬ 
tion,  we  shall  answer  the  following  questions: 

i)  Under  what  conditions  does  the  random  mapping  a(a)  have  a  fixed-point  &? 

ii)  Under  what  conditions  does  the  fixed-point  iteration  in  (4.8)  converge  to  the 
fixed-point  a?  and 

iii)  What  limit  and  limiting  distribution  does  the  PF  estimator  a  possess  as  the 
sample  size  n  tends  to  infinity? 

This  section  provides  a  set  of  sufficient  conditions  under  which  a  unique  fixed-point  d: 
exists  and  can  be  found  in  the  vicinity  of  a*  by  the  iteration  in  (4.8)  almost  surely. 
Under  these  conditions,  a  is  also  shown  to  be  strongly  consistent  and  asymptotically 
normal  for  estimating  a*.  For  simplicity,  these  results  are  formulated  in  terms  of  the 
least  squares  estimator  a(a)  defined  by  (4.10). 

4.2.1  Existence  and  Convergence 

Suppose  that  the  filter  {hj(a)}  also  satisfies  the  following  regularity  conditions: 

(A3)  The  filter  {hj(a)}  is  uniformly  strictly  stable  in  A. 

(A4)  The  filter  {hj(a)}  is  continuously  differentiable,  and  its  derivatives  are  uniformly 
strictly  stable  in  A. 


72 


Under  these  additional  conditions,  together  with  (Al)  and  (A2),  we  first  show  that 
the  random  mapping  a(a)  is  uniformly  consistent  for  estimating  a(a)  up  to  the  first 
derivative,  as  summarized  in  the  following  lemma. 

Lemma  4.1  Suppose  that  (Al),  (A3),  and  (A4)  are  satisfied.  Then,  both  a(a)  and 
a(oi)  are  continuously  differentiable.  Moreover,  as  n  — *  oo, 

a(a)  a-4'  a(a)  and  a  '(a)  ^4  a'(a:) 

uniformly  in  a  6  A,  where  a  '(a)  and  a'(a)  are  Jacobian  matrices  of  a(a)  and  a(a). 

PROOF.  According  to  Theorem  3.5,  the  sample  autocovariances  of  the  filtered  data 
are  uniformly  strongly  consistent.  Therefore,  under  (A3),  we  obtain 

n~1QTYT(a)  Y(a)Q  a-5-'  Ry(a)  and  n~1QTYT(a.)y(a)  a-i-' ry(a) 

uniformly  in  a  €  A  as  n  —*  oo.  Since  Ry(a)  is  nonsingular  under  (Al),  it  follows  that 
QTYT(a)  Y(a)Q  is  nonsingular  almost  surely  for  sufficiently  large  n,  and  that 

a(a)°-4-Ry-i(a)ry(a)  =  a(a) 

uniformly  in  a  €  A.  The  nonsingularity  of  Ry(a)  and  QTYT(ct)  Y(cc)Q,  in  connec¬ 
tion  with  (A4),  also  guarantees  that  a(a)  and  a(a)  are  continuously  differentiable. 
Applying  Theorem  3.6  proves  that  a  '(a)  ^4  a  '(a)  uniformly  in  a  £  A.  <)> 

In  the  sequel,  we  shall  also  make  use  of  the  following  lemma. 

Lemma  4.2  Let  C(a)  be  the  matrix  defined  in  (4.7),  and  assume  that  (Al)  and  (A2) 
are  valid.  Then  the  spectral  radius  of  C(a*)  is  less  than  1.  Moreover,  if  C(a)  is 
continuous  at  a*,  then  a'(a:)  is  differentiable  at  a*  and  a '(<**)  =  C(a*).  Therefore, 
being  the  unique  fixed-point  o/a(a),  a*  is  attractive. 

PROOF.  It  is  easy  to  verify  from  (4.7)  that  C(a)  can  be  written  as 

C(a)  =  (I  +  r(«)}_1  with  r(o)  :=  R"1(«)R1,(a).  (4.13) 


73 


Let  A;-  and  p (j  =  l,...,g),  be  the  eigenvalues  and  corresponding  eigenvectors  of 
T(a*),  then  1/(1  +  Xj)  are  eigenvalues  of  C(a*),  associated  with  eigenvectors  p^. 
By  definition  (Ortega  and  Rheinboldt,  1970,  p.  43),  the  spectral  radius  of  the  matrix 
C(a*)  is  given  by  g  =  max{l/|l  -f  Aj|}  >  0.  Therefore,  g  <  1  if  and  only  if  1 1  +  A^- 1  >  1 
for  all  j.  On  the  other  hand,  since  T(a*)  p;  =  XjPj,  it  follows  from  (4.13)  that 

p?Kx(<x-)pj  =  \j{pfRe(a*)pj}. 

Note  that  Rx(a*)  and  R£(a*)  are  positive  definite  under  (Al).  Therefore,  we  obtain 
pj'Rj/ct*)  p_,  >  0  and  pj*Re(a*)  pj  >  0  for  all  j.  This,  in  turn,  yields  Xj  >  0  for  all 
j.  As  a  consequence,  we  obtain  |1  +  A^|  =  1  +  A,-  >  1  for  all  j  and  hence  g  <  1. 

To  proceed  with  the  proof,  we  note  that  g  <  1  implies  the  existence  of  a  norm 
||  •  ||  such  that  ||C(a*)||  <  1  (Ortega  and  Rheinboldt,  1970,  Theorem  2.2.8,  p.  44). 
Moreover,  under  (A2),  it  follows  from  (4.6)  that 

IK«)  -  «*  -  C(a*)  (a  -  a*)||  <  ||C(a*)  -  C(o)||  ||a  -  a*||. 

Since  C(o:)  is  continuous  at  a*  by  assumption,  and  a(a*)  =  a*,  we  obtain 

||a(«)  -  a(a*)  -  C(a*)  («  -  a*)||  =  o(||«  -  a*||). 

By  definition  (Ortega  and  Rheinboldt,  1970),  the  mapping  a(c*)  is  differentiable  at 
a*,  and  its  derivative  coincides  with  C(a*).  The  fixed-point  a*  of  a(a)  is  attractive 
since  ||a'(a*)||  =  ||C(a*)||  <1.  <> 

Based  on  these  lemmas,  the  existence  of  the  PF  estimator  a  as  a  fixed-point  of 
a(a)  and  the  convergence  of  the  iteration  in  (4.8)  to  a  can  be  established  as  follows. 

Theorem  4.2  Under  (A1)-(A4),  the  following  assertions  hold  almost  surely,  provided 
that  n  is  sufficiently  large. 

a)  There  exists  a  neighborhood  SA(a*)  :=  {«  :  ||a-a*||  <  A)  of  a*,  with  A  being 
independent  of  n,  in  which  the  random  mapping  a(a)  constitutes  a  contractive 
mapping ,  and  hence  possesses  a  unique  fixed-point  <x. 


74 


b)  The  sequence  {dm}  produced  by  the  fixed-point  iteration  (4.8)  converges  to  a  as 
m  — »  oo  provided  that  do  €  5$(d),  where  S((a)  :=  {a  :  ||a  -  d||  <  <S}  is  a 
neighborhood  of  a,  with  S  being  independent  of  n. 

Proof.  Since  C (a)  is  continuous  under  (A4),  it  follows  from  Lemma  4.2  that  a'(a*)  = 
C(«*),  and  that  ||a'(a*)||  <  1  for  some  norm  ||  •  ||.  Furthermore,  the  continuity  of 
a'(o!)  and  C(o:)  under  (A4)  also  guarantees  the  existence  of  a  constant  0  <  c  <  1  and 
a  neighborhood  SAo(ac*)  :=  (a  :  ||a:  -  a*||  <  A0)  C  A  such  that 

||a'(a:)||  <  c  and  ||C(a)||  <  c  (4-14) 

for  all  a  €  5a0(<**).  Let  ?  :=  (c  4- 1)/2  <  1,  then  the  uniform  convergence  of  a'(a)  to 
a'(a)  implies  that  ||a'(a)||  <  c  almost  surely  for  all  a  6  SAo(am)  when  n  >  N0,  where 
N0  is  independent  of  a.  On  the  other  hand,  using  the  mean- value  theorem  (Ortega 
and  Rheinboldt,  1970,  p.  71)  we  can  show  that  for  all  n  >  N0 

||a(aj)  -  a(a2)||  <  ?  ||a=i  -  a2||  (4.15) 

almost  surely  and  uniformly  in  a1?  a2  €  SAo(a*).  By  definition  (Stoer  and  Bulirsch, 
1980,  p.  251),  this  implies  that  the  random  mapping  a(a)  is  contractive  on  5a0(o:*) 
almost  surely.  In  particular,  the  inequality  (4.15)  is  valid  almost  surely  on  the  smaller 
neighborhood  SA(a*)  {o:  :  ||a  —  a*||  <  A}  with  A  =  |A0  for  all  n  >  N0. 
Furthermore,  by  Lemma  4.1,  the  convergence  of  a(a)  to  a(a)  guarantees  that 

||a(a*)  -  a*||  =  ||a(a*)  -  a(a*)||  <  (1  -  f)A 

almost  surely  for  all  n  >  N.  Therefore,  according  to  Theorem  5.2.3  of  Stoer  and 
Bulirsch  (1980),  the  mapping  a(a)  has  a  unique  fixed-point  d  on  ^(ct*)  almost 
surely  whenever  n  >  ma x(N,N0).  Part  a)  of  the  theorem  is  thus  proved.  To  show 
Part  b),  we  take  a  constant  6  such  that  0  <  <5  <  |A0.  Then  the  neighborhood  Ss(a) 
of  the  fixed-point  d  is  contained  in  SAo(<x*)  and  hence  the  inequality  (4.15)  remains 


75 


valid  almost  surely  and  uniformly  in  aJt  a2  €  Ss(a),  provided  that  n  >  max(7V,  N0). 
By  Theorem  5.2.2  of  Stoer  and  Bulirsch  (1980),  the  sequence  {«„,}  produced  by  (4.8) 
converges  to  ct  as  m  — *■  oo  almost  surely,  if  n  >  max(iV1,JV0)  and  the  initial  value  a0 
is  chosen  in  the  neighborhood  Sf(a).  0> 

Remark  4-3  The  theory  of  numerical  analysis  tells  us  (Stoer  and  Bulirsch,  1980) 
that  the  spectral  radius  g  of  C(a*)  is  crucial  to  the  rate  of  convergence  of  the  fixed- 
point  iteration  in  (4.8).  Indeed,  the  smaller  the  spectral  radius  g  is,  the  faster  is 
the  convergence  to  a.  As  seen  in  the  proof  of  Lemma  4.2,  g  =  1/(1  +  Amin),  where 
Amin  :=  min{Aj}.  Therefore,  to  accelerate  the  fixed-point  iteration,  Am;n  should  be 
made  as  large  as  possible.  Notice  that 


.  pHKx(a*)p 
mm  — ,••••, — -v — . 
p^o  pJ/Re(a*)p 


In  the  case  of  q  =  1,  Amin  reduces  to  rg(a*)/r£(a!*),  which  is  readily  recognized  to 
be  the  signal-to-noise  ratio  of  the  filtered  process  {i/t(a!*)}.  For  q  >  1,  Amin  can  be 
regarded  as  a  generalized  indicator  of  the  amount  of  signal  relative  to  the  amount  of 
noise  in  the  filtered  process  {yt(a*)}.  This  implies  that  the  fixed-point  iteration  can 
be  accelerated  if  the  parametric  filter  enhances  the  sinusoidal  signal  when  a  takes  on 
values  in  a  neighborhood  of  a*. 

Remark  4-4  In  order  to  accommodate  poor  initial  guesses,  it  is  required  to  have 
a  large  neighborhood  of  a  —  the  convergence  region  —  in  which  the  fixed-point 
iteration  in  (4.8)  converges.  Theorem  4.2  gives  a  conservative  estimate  of  this  region 
—  namely,  the  neighborhood  .5/ (a)  —  which  basically  consists  of  those  a  for  which 
||a'(a:)||  <  ?  <  1.  Therefore,  the  filter  should  be  selected  to  provide  a  large  set  of  such 
a  in  order  to  obtain  a  large  convergence  region. 


76 


4.2.2  Strong  Consistency  and  Asymptotic  Normality 

Suppose  that  d  is  the  fixed-point  of  the  random  mapping  a(a)  in  the  vicinity  of  a*. 
In  the  following,  we  shall  investigate  asymptotic  properties  of  d  as  the  sample  size 
n  tends  to  infinity.  The  following  theorem  claims  the  strong  consistency  of  the  PF 
estimator  d  for  estimating  at*. 

Theorem  4.3  Suppose  that  (A1)-(A4)  are  satisfied,  and  let  d  be  the  unique  fixed- 
point  of  B.(a)  in  Sa(oc*),  as  given  by  Theorem  4.2.  Then  at  converges  to  a*  almost 
surely  as  n  — *  oo. 

Proof.  Since  a(d)  =  d  and  a(a*)  =  a*,  it  follows  from  (4.6)  that 

at  —  «*  =  £a(d)  +  a(d)  —  a(a:*)  =  8a.(at)  +  C(d)  (a  —  a*)  (4-16) 

where  Sk(ot )  :=  a(«)  —  a(«).  By  (4.14),  we  have  ||C(cx)||  <  c  <  1  for  any  at  £  S^at*). 
We  find  that 

11“  -  “'ll  <  ll«(<S)ll  +  ||C(oc)J|  ||*  -  «*||  <  ||«(<S)||  +  c  ||6  -  oc*|| 
and  hence 

||d-a*||<(l-c)-1||«5a(d)||  (4.17) 

almost  surely  for  large  n.  The  uniform  convergence  of  a(a)  to  a(ct),  as  claimed  in 
Lemma  4.1,  guarantees  that  ||M(d)||  ^4  0  as  n  — ►  oo,  which,  together  with  (4.17), 
proves  the  assertion.  <$> 

The  asymptotic  normality  can  also  be  established  for  the  PF  estimator  d.  To  this 
end,  we  first  need  the  following  lemma  regarding  the  asymptotic  normality  of  a(a*). 

Lemma  4.3  Suppose  that  (A1)-(A3)  are  satisfied  and  that  E(ff)  —  <  oo.  Then, 

as  n  —*■  oo,  *Jn8k(ot*)  =  y/n  {a(o:*)  —  a}  converges  in  distribution  to  a  normal  random 
vector  with  mean  zero  and  covariance  matrix 

V  :=  Ry '(ot*)  QTW(a*)  Q 


77 


where  W(a*)  :=  [«>,-,•  (a*)],  (i,  j  =  1, . .  .,2q  -  1),  and 

OO  (  2q  f  2q  'I 

”«(«*)  - 1> >■;+,-»(«*)  l> ’■:«-*(“•)  \ ■  (4.18) 

r=0  U=o  )  U=o  ) 

Proof.  The  proof  utilizes  the  large  sample  theory  in  Chapter  3  concerning  the  asymp¬ 
totic  normality  of  sample  autocovariances  of  filtered  data.  First  of  all,  we  have 

*&(«*)  =  -{(QtYtY  Q)-1Qt  YTy  +  a}. 

Here,  as  well  as  in  the  following,  the  argument  a*  is  omitted  for  the  sake  of  brevity. 
As  in  the  proof  of  Theorem  3.3,  we  can  write 

n-1QTYTYQ  =  R  +  oP(n~1/2)  and  n-xQTYry  =  r  +  o^n"1'2),  (4.19) 

where  R  :=  QTRQ,  r  :=  QT(f  +  rB)  =  2 QTf,  with 


fj  :=  n  1  Y,  yt+j(a* )  &(«*)  (i  =  0, 1, . . . ,  2q  -  1). 
t= i 

Using  these  results,  in  connection  with  the  nonsingularity  of  R,  we  obtain 

Sk(a*)  =  —  (R_1?  +  a)  +  oP{n~1^2) 

=  — R_1(r  4-  Ra)  4-  oP{n~1^2).  (4.20) 

Let  Tj  :=  rj(a*)  for  brevity.  Then,  according  to  Theorem  3.3,  \fn  (fj  -  ri ) ,  (j  — 
0, . . . ,  2q  —  1)  are  asymptotically  jointly  normal  with  mean  zero  and  covariance  matrix 


78 


Vr  :=  [ct.j],  where 


i  oo 

an  :=  2E^(a*)  coB(*wt)  cos^Wfc)  reT(a*)  cos(ruk) 

k=l  T=— oo 

+  (K-3)rKa>;(a*) 

OO 

+  E  kkk+^K) +'■:«(“’)  >•;-,(<»*)}  (4.2i) 

T=  — OO 

for  i,  j  =  0, 1, . .  .  ,2<?  —  1.  On  the  other  hand,  the  vector  r  +  Ra  in  (4.20)  can  be 
regarded  as  the  value  of  some  function  f(u0, . . . ,  u29-i)  at  uj  =  fj,  that  is,  r  +  Ra  = 
f(r0, . . . ,  ?2j-i)-  For  this  function,  it  is  also  true,  by  (4.1),  that 

f(ro?  •  •  • ,  r2q-i)  =  ry(a*)  +  Ry(a*)a 

=  ry(a*)  +  Ry(a*)a(a*)  =  0. 

Therefore,  invoking  Proposition  6.4.3  of  Brockwell  and  Davis  (1987)  proves  that 

y/n  (r  +  Ra)  =  y/n  {f(f0, . . . ,  r2q-i)  ~  f (r0,  ■■■,  r2,_i)} 

converges  in  distribution  to  a  normal  random  vector  with  mean  zero  and  covariance 
matrix  Vy  :=  FVrFT,  where  F  is  the  Jacobian  matrix  of  f  evaluated  at  (r0, . . . ,  r2?_ i). 
It  is  easy  to  verify  that 

r  +  Ra  =  QT(r  +  rB  +  RQa)  =  QT[r  :  R  :  rB]  a, 
where  a  :=  [a0,  a i,  •  •  •  ,  a2?]T.  Simple  algebra  shows  F  =  [f0,  •  •  •  ,  f2?_i],  where 


ai 

ai-j  +  a\  +j 

fo  =  QT 

• 

and  fy  —  Qt 

'■ 

«2?-l 

a2q—l—j  +  O-lq-l+j 

for  j  —  1, . .  .,2 q  —  1,  with  ak  :=  0  for  k  <  0  and  k  >  2 q.  Upon  noting  that  a given 

by  (4.21)  are  symmetric  in  the  sense  that  =  &ij,  we  can  rewrite 

Vy  as  Vy  =  QTBQ,  where  B  :=  [h;y],  ( i,j  =  1, ...,2g  —  1),  with 

2  q 

bjj  'y  )  ( Ti—U'j—v  o,u  av  . 

«,v=0 


79 


As  can  be  seen  from  (4.21),  there  are  three  groups  in  the  expression  of  The  first 
group  involves  the  sinusoidal  terms,  all  of  which  are  cancelled  out  in  the  expression  of 
b{j,  because  A(z jjT1)  =  0  for  k  =  1, . .  ,,q  and  hence 
2  q 

J2  o-v  cos(wjfe(;  -v))  =  0  (j  =  1, . . . , 2q  —  1;  k  =  1, . . . ,  q). 

v=0 

The  second  group  in  (4.21)  consists  of  (k  —  3)  r-(a*)  rj(a*).  It  is  easy  to  see  that  the 
corresponding  term  in  V/  can  be  written  as  (k  -  3)  UUT,  where 

U  :=  QT[f£(a*)iR£(a*)iff(a*)]a 

=  r£(a*)  +  R£(a*)a  =  r£(a*)  +  R£(a*)  a*. 


Since  U  =  0  under  (A2),  this  term  also  vanishes  in  V/.  Combining  these  results,  we 
obtain  Vf  =  QTB0  Q  where  B0  :=  [6^0)]  with 

oo  2  q 

b$):=  Y2  a«a«  rr+i-U(«*){rr+j-»(«*)  +  K-U+V  («*)}• 

r=  — oo  u,  v ™0 

Furthermore,  it  is  not  difficult  to  verify  that  B0  can  be  written  compactly  as 


OO 

B0  =  ATrT(a*)r^(a*)(A  + IA), 

T=  — OO 


where  A  is  a  (4 q  -  l)-by-(2^  -  1)  matrix  of  the  form 

[  «2g  0  1 


A  := 


Go 


(4.22) 


0  o0 


and  rT(o:*)  :=  [r*_2j+1(a*),  •  •  •  ,r‘+3#_1(a£*)]T.  Since  IA  =  AI  and  IQ  =  Q,  we 
obtain  IAQ  =  AQ.  Therefore, 


OO 

V/  =  2  QTATrT(a*)r^(a*)AQ. 

T  —  -CO 


80 


This  expression  can  be  further  simplified  as 

OO 

V/  =  4  £  QT  ATrT(a*)  rj(a*)  AQ  =  QT W(a*)  Q, 

T= 0 

upon  noting  that  r_T(a)  =  Irr(a!),  QTATro (o:*)  =  U  =  0,  and 

OO 

W(«*)  =  4  ATr r(a*)  r^(a*)  A. 

T=0 

Finally,  since  R  a—>  Ry(a*),  by  Slutsky’s  theorem  (Lehmann,  1982,  Lemma  4.1,  pp. 
432-433)  we  obtain  from  (4.20)  that  y/n6a(a.*)  =  -R_1v/n(r+Ra)  +  Op(l)  converges 
in  distribution  to  N(0,  V).  <> 

With  the  aid  of  this  lemma,  we  are  now  able  to  show  the  asymptotic  normality  of 
the  PF  estimator  a. 

Theorem  4.4  Under  the  conditions  in  Theorem  4.3,  yfn  (d  —  a*)  converges  in  distri¬ 
bution  as  n  — >  oo  to  a  normal  random  vector  with  mean  zero  and  covariance  matrix 

V«  =  R;1^*)  QTW(a*)  Q  RJV)  (4.23) 

where  W (a*)  is  defined  in  Lemma  4.3. 

Proof.  It  follows  from  (4.16)  that  (I  -  C(d)}  (d  -  a*)  =  £a(d).  According  to  the 
mean-value  theorem  (Ortega  and  Rheinboldt,  1970,  p.  71),  <*>a(d)  can  be  written  as 

£a(d)  =  £a(c**)  +  |  J  Sk'(a*  +  A(d  —  <**))  dA |  (d  —  «*) 

where  £a'(a)  is  the  Jacobian  matrix  of  6a(a).  Since  Sk'(oc)  =  a  '(a)  —  a'(a)  a—*  0 
uniformly  in  a  €  A,  as  guaranteed  by  Lemma  4.1,  we  have 

/ 1  6a'(a*  +  A(d  -  a*))  d\  0. 

Jo 

In  addition,  the  consistency  of  d,  together  with  the  continuity  of  C(ct),  implies 
that  C(d)  45-  C(a*).  Therefore,  by  Slutsky’s  theorem,  y/n(a  —  a*)  has  the  same 
asymptotic  distribution  as  (I  —  C(o:*)}-1#a(a!*).  Invoking  Lemma  4.3  shows  that 


81 


y/n(a.-a*)  converges  in  distribution  to  N(0,  Va),  where  Va  :=  {I  — C(a*)}-1V  {I  — 
C(a*)}_1.  Furthermore,  using  the  expression  of  C(a)  in  (4.13)  and  applying  the 
matrix-inversion  formula  (Haykin,  1986),  we  can  write 

{I  -  C(a)}-1  =  I  +  r_1(a)  =  r-1(a)  {I  +  r(a)}. 

On  the  other  hand,  Rj,(q:)  =  R^a)  +  R£(a)  =  R£(a)  {I  -f  r(o:)}.  Therefore,  we 
obtain  {I  —  C(q:)}_1R“1(q:)  =  r_1(a)  R71(q)  =  R“1(a).  Substituting  this  result  in 
the  above  expression  of  Ya  completes  the  proof.  <0> 

Remark  4-5  The  asymptotic  variance  Va  in  (4.23)  is  seen  to  be  inversely  related 
to  the  signal-to-noise  ratio  in  the  filtered  process  {yt(a*)}.  This  is  compatible  with 
the  requirement  that  the  filter  should  effectively  enhance  the  sinusoids  and  suppress 
the  noise.  According  to  Remark  4.3,  the  convergence  of  the  fixed-point  iteration  in 
(4.8)  can  also  be  accelerated  by  appropriate  filters  that  improve  the  SNR. 

4.3  Extension  to  Complex  Sinusoids 

A  parallel  theory  of  the  PF  method  can  be  easily  established  for  complex  sinusoids  in 
additive  noise.  In  fact,  if  {aq}  is  a  sum  of  p  complex  sinusoids  given  by 

xt 

k  =  1 

with  0  <  uq  <  •  •  •  <  u>p  <  2ir,  then  it  satisfies  a  pth-order  AR  autoregressive  equation 
of  the  form  Y?j=o  aj  xt-j  =  0,  where  the  AR  parameter  vector  a  :=  [a1?  •  •  •  ,ap]T  is 
defined  by  the  coefficients  of  the  polynomial 

I Zaiz~j  =  H(l- 

j= o  *= 1 

with  zk  :=  exp(iu>fc).  Given  a  finite  data  set  {jq, . . . ,  t/,,}  observed  from  (1.1),  one  of 
the  widely- used  estimators  of  the  AR  parameter  a  is  given  by 

aLS  :=  -(YffY)-1Y/fy  (4.24) 


82 


where  Y  and  y  are  redefined  by 


and  y  := 


Vn—l  ' ‘ '  Vn—p 


This  estimator  is  known  as  the  forward  linear  prediction  (FLP)  (Kay  and  Marple, 
1981)  which  minimizes  the  criterion  ||y-f  Ya||2.  Other  procedures  such  as  the  forward- 
backward  linear  prediction  (FBLP)  (Kay  and  Marple,  1981)  are  also  applicable  for 
estimating  the  AR  parameter  a  with  only  a  slight  modification  of  Y  and  y  in  (4.24). 

Introducing  a  parametric  filter  {hj(a)}  indexed  by  a  :=  [a1}  •  •  • ,  ctp]T,  an  estima¬ 
tor  a(a)  of  the  AR  parameter  a  can  be  obtained  according  to  (4.24),  with  the  data 
matrices  replaced  by  those  of  the  filtered  data  {j/4(a)}  in  (4.9).  The  parametrization 
property  (A2)  retains  its  form,  but  Rc(a:)  and  re(a )  are  now  of  the  structure 


Rf  := 


(4.25) 


r-p+l  r-p+ 2 


with  r*  replaced  by  E{et+T(ac)  c((cc)}.  In  this  case,  (A2)  is  readily  recognized  as 
being  the  Yule- Walker  equations.  In  other  words,  for  complex  sinusoids,  (A2)  can  be 
interpreted  as  the  requirement  that  the  filter  be  parametrized  so  that  the  parameter 
a  satisfies  the  Yule- Walker  equation  for  the  filtered  noise.  With  this  property  being 
fulfilled,  the  PF  estimator  a  is  defined  as  the  fixed-point  of  the  random  mapping  a(a) 
and  can  be  obtained  by  the  fixed-point  iteration  in  (4.8). 


83 


Chapter  5 


PF  Method  with  AR  (All-Pole)  Filter 


Although  its  consistency  is  guaranteed  by  the  asymptotic  theory  we  developed  in 
Section  4.2,  the  accuracy  of  the  PF  estimator  depends  on  the  choice  of  the  parametric 
filter  to  be  applied  to  the  data  (see  also  Remark  4.5).  Intuitively,  a  “good”  filter 
should  be  bandpass  —  so  that  the  sinusoidal  signal  can  be  effectively  enhanced  after 
filtering.  A  useful  example  of  such  a  filter  is  the  AR  filter  (not  to  be  confused  with 
the  AR  method)  we  shall  consider  in  this  chapter  as  an  illustration  of  the  PF  method. 

The  AR  filter  —  also  known  as  the  all-pole  filter  —  is  considered  in  the  literature  as 
a  filter  that  whitens  the  error  term  in  the  AR  representation  of  {t/(}.  In  fact,  as  we  have 
seen  in  Chapter  2,  the  reason  that  Prony’s  estimator  leads  to  inconsistent  frequency 
estimates  is  that  {yt}  in  (1.1)  does  not  obey  an  AR  model  which  requires  the  error  {et} 
to  be  uncorrelated  (white)  in  the  AR  representation  (2.4).  Instead,  the  process  {e<}  is 
colored,  admitting  the  moving- average  (MA)  representation  (2.14).  Clearly,  when  {e(} 
is  an  i.i.d.  random  sequence,  {et}  would  be  an  MA  process  with  the  MA  coefficients 
being  identical  to  the  AR  coefficients  in  (2.4).  This  observation  gave  rise  to  the  idea  of 
iteratively  whitening  the  error  {et}  by  AR  filtering  based  on  previously  estimated  AR 
coefficients  from  the  filtered  data.  Along  this  line  is  the  iterative  filtering  algorithm 
(IFA),  proposed  by  Kay  (1984),  which  estimates  the  AR  coefficients  from  iteratively 
filtered  data  by  Burg’s  estimator  and  whitens  the  error  term  by  an  AR  filter  with  poles 


84 


on  the  unit  circle.  This  procedure  updates  the  filter  so  that  the  coefficients  of  the  filter 
on  the  (m+l)st  iteration  coincide  with  the  estimated  AR  coefficients  from  the  previous 
mth  iteration.  The  procedure  is  known  to  provide  very  good  frequency  estimates  also 
for  relatively  low  SNR.  However,  since  IFA  uses  a  filter  whose  poles  are  on  the  unit 
circle,  the  bandwidth  is  very  narrow  and  the  iterative  procedure  requires  a  rather 
precise  initial  guess.  In  a  recent  paper  by  Dragosevic  and  Stankovic  (1989),  iterative 
least  squares  estimation  was  used  in  connection  with  an  AR  filter  which  has  an  extra 
parameter  to  force  the  poles  to  be  within  the  unit  circle.  This  extra  feature  guarantees 
the  stability,  and  also  controls  the  bandwidth  of  the  filter.  The  resulting  estimator 
is  referred  to  as  the  generalized  least  squares  (GLS)  estimator.  (GLS  without  the 
extra  bandwidth  parameter  can  be  found  in  Matausek,  et  al.  (1983)  where  instability 
of  AR  filtering  was  reported,  especially  when  the  SNR  is  low.)  Simulations  indicate 
that  the  GLS  estimator  (endowed  with  the  bandwidth  parameter)  improves  on  the 
performance  of  the  IFA  estimator  for  high  SNR  (Dragosevic  and  Stankovic,  1989).  In 
general,  however,  the  GLS  estimator  is  not  consistent. 

In  this  chapter,  we  show  that  the  AR  filter  used  by  Dragosevic  and  Stankovic 
(1989)  can  be  readily  modified  to  satisfy  the  parametrization  property  (A2).  With 
this  modification,  the  resulting  PF  estimator  outperforms  the  GLS  estimator  in  terms 
of  mean-squared  error,  and  especially  so  for  closely-spaced  frequencies. 

5.1  General  AR  Filter  and  PF  Estimator 

The  AR(2<?)  filter  (or  simply  the  AR  filter)  is  a  parametric  filter  defined  by1 

yt(a)  4-  91(a)r}yt_l(a)  +  •  •  •  +  02q(a)  ifq  ^_2?(«)  =  Vt  (5-1) 

where  i)  €  (0, 1]  is  the  bandwidth  parameter  that  attracts,  when  taking  on  values  less 
than  1,  the  poles  of  the  filter  inside  the  unit  circle,  and  controls  the  bandwidth  of  the 

1Assuming  zero  initial  values  yields  the  filtered  data  y% (cr), . . . ,  yn(a)  for  the  calculation  of  a(a). 


85 


filter.  The  coefficients  Oj(a)  of  the  filter  are  functions  of  a,  and  are  symmetric  in  the 
sense  that 

0o(ol)  =  1  and  02q-j(a)  =  9s(a)  (j  =  0, 1 ,...,?). 

In  the  GLS  method  (Dragosevic  and  Stankovic,  1989),  this  filter  was  employed  with  a 
specific  choice  of  the  filter  coefficients  so  that  0j(a)  =  a j,  that  is, 

0(a)  =  a  (5.2) 

where  0(a)  is  the  ^-dimensional  vector 

0(a)  :=  [0i(a),---  ,^(«)]T 

A  similar  filter  without  rj  (or,  equivalently,  with  rj  —  1)  was  also  used  by  Kay  (1984) 
for  estimating  complex  sinusoids. 

Let  us  assume  that  the  additive  noise  {et}  in  (1.2)  is  white  with  zero  mean  and 
finite  fourth  moment.  It  will  be  shown  that  a  very  simple  relationship  between  0(a) 
and  a  can  be  established  so  that  the  resulting  filter  possesses  the  parametrization 
property  (A2),  and  that  the  theoretical  results  in  Chapter  4  can  be  applied  to  this 
filter  upon  appropriately  selecting  a  in  a  parameter  space. 

5.1.1  Parametrization 

As  we  have  seen  in  Chapter  3,  the  PF  method  requires  the  filter  to  possess  the 
parametrization  property  (A2).  For  white  noise  {et},  the  filtered  noise  {et(a)}  is 
readily  recognized  as  being  an  AR  process  with  the  autocovariance  function  reT(a) 
satisfying  the  equations 

Y,rf  0j(a)r€r_j(a)  =  0  (r  =  l,2,...).  (5.3) 

j=o 

In  matrix  notation,  (5.3)  can  be  written  as 

re(a)  +  i fq  rf  (a)  =  -Re(a)  Q  0(a)  (5.4) 


86 


where  Q  is  a  (2 q  -  l)-by-<?  matrix  of  the  form 


Q:= 


Qi  0 

0T  T)9 

Q2  o 


with  Q!  :=  diag(?7, . . . , r)q  *)  and  Q2  :=  7??QiI.  Now,  by  pre-multiplying  each  side  of 
(5.4)  with  2  QT,  we  obtain 


(1  +  rf9)  r £(a)  =  -2  QTR£(a)  Q  0(a). 

Therefore,  the  parametrization  property  (A2)  requires  that 

0(a)  =  |(1  +  ^){QTRe(a)Q}-1R£(a)a. 

On  the  other  hand,  a  simple  calculation  shows  that 

=  irsQTft-(a)QT>‘  =  ir^R-(“)T»‘ 


where  T,  is  a  q-by-q  diagonal  matrix  of  the  form 

m  i  +  v2q  1  + 


}2  q 


1  +  r/2? 


(5.5) 


Tjq-i  q.  TJ1+1  ’  2t/? 

As  a  result,  the  required  parametrization  (5.5)  is  simplified  to  trivial  linear  equations 


0(a)  =  T„  a.  (5.6) 

With  the  coefficients  given  by  (5.6),  the  AR  filter  in  (5.1)  satisfies  (A2)  so  that  a 
sequence  {am}  can  be  generated  by  the  fixed-point  iteration  in  (4.8)  to  estimate  the 
AR  parameter  a. 

As  compared  to  (5.2),  we  observe  that  when  rj  <  1,  the  parametrization  of  the  PF 
method  differs  from  that  of  the  GLS  method.  It  is  this  difference  that  makes  the  PF 
estimator  consistent  for  any  bandwidth  parameter  r)  <  1,  while  the  GLS  estimator  is 
inconsistent.  Note  that  the  PF  and  GLS  methods  coincide  when  r]  —  1.  Moreover, 


87 


the  iterative  filtering  algorithm  (IFA)  of  Kay  (1984)  is  easily  seen  to  correspond  to  the 
complex  version  of  the  PF  method  with  p  =  1.  The  difference  is  that  Burg’s  estimator 
was  employed  in  IFA,  instead  of  the  least  squares,  in  order  to  guarantee  the  stability 
of  AR  filtering  (Kay,  1984). 

5.1.2  Parameter  Space 

For  the  theoretical  analysis  in  Chapter  4  to  hold,  the  AR  filter  (5.1),  with  9(a)  given 
by  (5.6),  is  required  to  satisfy  the  regularity  conditions  (A3)  and  (A4).  Therefore, 
the  parameter  a  must  take  on  values  within  a  certain  parameter  space  in  which  these 
conditions  are  guaranteed. 

For  any  rj  <  1,  let  0(r ?)  be  the  collection  of  9  ,0q]T  for  which  the 

polynomial  Q(z~y)  :=  YpLo  ®iz~j  with  #o  =  1  and  =  0;  has  2 q  strictly  complex 
zeros  (with  non-zero  imaginary  part)  on  or  inside  the  circle  \z\  =  (1  -  S)/rj,  where 
8  £  (0, 1)  is  a  small  real  number  such  that  1  —  6  >  rj.  Let  A(p)  be  the  set  of  a  such 
that  9(a)  £  Q(g),  namely, 

A(V)  :=  {a  :  9(a)  £  0(t/)}.  (5.7) 

Then,  for  a  £  A(p),  we  have  the  following  theorem  which  claims  the  validity  of  (A3) 
and  (A4)  for  the  AR  filter  (5.1)  with  9(a)  given  by  either  (5.2)  or  (5.6). 

Theorem  5.1  Let  A(p)  be  the  parameter  space  defined  by  (5.7).  Then  the  AR  filter 
(5.1)  satisfies  (A3)  and  (A4)  for  a  £  A(rj)  if  9(a)  is  given  by  (5.2)  or  (5.6). 

To  prove  this  theorem,  we  first  need  the  following  lemma. 

Lemma  5.1  For  any  a  £  A(p),  the  poles  of  the  AR  filter  (5.1)  either  appear  on  the 
circle  \z\  —  77  <  1— 8,  or  occur  in  reciprocal  pairs  within  the  band  p2  /  (1—8)  <  \z\  <  1— 8. 
Consequently ,  the  AR  filter  is  stable  for  all  a  £  A(p). 


88 


Proof.  By  definition,  a  £  .4(t?)  implies  that  0  :=  9(a)  £  ©(r?).  Therefore,  there 
exist  constants  0  <  <  •  •  ■  <  A,,  <  7r  and  0  <  pk  <  (1  -  8)/p  such  that 

Qi*-1)  :=  Ys6iz~’  = 

i=o  i=i 

Let  vk  :=  ppk  exp(«Ajt)  for  k  -  1 and  i/*  :=  r)p2q_k+1  exp(-iX2q-k+i)  for  fc  = 
9  +  1, . . . ,  2q.  Since  Q(pz~y)  can  be  factorized  as 

Q^z-1)  =  Y[(l  -  i/kZ'1)  (5.8) 

k= 1 

the  zeros  of  Q(pz~y)  —  namely,  the  poles  of  the  AR  filter  (5.1)  —  are  readily  identified 
to  be  the  complex  conjugate  pairs  (i/k,  v2q-k+i),  (k  =  !,...,</).  The  symmetry  of  6j 
implies  that  the  zeros  of  Q(z~y)  also  constitute  reciprocal  pairs,  and  hence  appear  in 
the  region  77/ (1  —  6)  <  \z\  <  (1  —  8)/r).  As  a  result,  the  poles  of  the  AR  filter  must 
occur  within  the  band  rf  /( 1  —  8)  <  \z\  <  1  —  6.  Moreover,  for  distinct  Xk  —  the  Xk 
with  multiplicity  1  —  the  corresponding  zeros  of  Q(z~y)  occur  on  the  unit  circle  and 
hence  the  corresponding  poles  of  the  AR  filter  lie  on  the  circle  \z\  =  rj.  The  AR  filter  is 
stable2  for  any  a  £  A(p)  because  l/Q(pz~y)  is  analytic  within  a  small  band  containing 
the  unit  circle,  and  hence  its  Taylor  series  expansion  is  absolutely  summable  on  the 
unit  circle.  <} 

Equipped  with  this  lemma,  we  now  present  the  proof  of  Theorem  5.1. 

Proof  of  Theorem  5.1.  Let  H(z-a)  1  /P(z;a)  where 

P(z;a)  :=  J29i(a)  z> . 
i= 0 

According  to  Lemma  5.1,  H(z;a)  is  analytic  in  \z\  <  (1  -  tf)-1  since  the  poles  of 
H(z;a)  are  on  or  outside  the  circle  \z\  =  (1  —  £)-1  >  1  for  any  a  £  A(r}).  Given 
p  £  (1  —  <5, 1),  the  circle  \z\  =  p~y  is  contained  in  the  analytic  region  of  H(z;a). 
Therefore,  by  Cauchy’s  inequality  (Markushevich,  1977,  vol.  1,  Theorem  14.7,  p.  302), 

2  Recall  that  a  filter  is  stable  if  its  impulse  response  sequence  is  absolutely  summable. 


89 


we  obtain 


l#;(°;Q)l/i!  < /^  max  |tf(z;a)|  ( j  =  0,1,...)  (5.9) 

\z\=p 

where  Hj(z-a)  stands  for  the  jth  derivative  of  H(z-,a )  with  respect  to  z.  Note  that 
|1  ~  Cl  >  1  ~  |CI  f°r  any  complex  number  £  with  |£|  <  1.  Note  also  that  P(^r-1;«) 
admits  the  factorization  (5.8).  Since  \vhz\  <  (1  -  6)/p  <  1  for  any  \z\  =  p_1,  it  follows 
that  |1  —  vkz\  >  Q0  :=  1  —  (1  —  6)/p  >  0  for  all  k.  Therefore,  from  (5.8),  we  obtain 
|P(*;<*)|  >  qI9  for  any  \z\  =  p_1,  and  hence 

ma*  \H(z;a)\  <  Qo2q  (5.10) 

for  any  a  £  A(r/).  On  the  other  hand,  let  {hj(a)}  be  the  impulse  response  of  the  AR 
filter  (5.1).  Then,  we  obtain  (Markushevich,  1977,  vol.  1,  Theorem  16.4,  p.  349) 

OO 

H(z;a)  =  J2hi(a)zj  and  hj(<*)  =  Hj(0;a)/j\  (j  =  0,1,...).  (5.11) 

j=o 

This  result,  together  with  (5.9)  and  (5.10),  implies  that  \hj  («)l  <  Qo  V  for  j  >  0 
and  a  £  A(rj).  Hence,  the  condition  (A3)  is  satisfied  by  the  AR  filter. 

Furthermore,  let  Pj(z ;  a)  be  the  jth  derivative  of  P(z;  a)  with  respect  to  z.  Then, 
it  is  readily  shown  that  H^a)  -  -{H{z]a)}2Pi{z]a).  Using  the  Leibniz  rule  of 
differentiation,  we  obtain,  for  j  =  1,2,..., 

Hs{z\a)  =  -Eq,-1Pi-«(^;a)Ec:ir.(z;a)irtt_.(^;a^ 

w=0  v  =0 

This,  together  with  (5.11),  yields  the  following  recursive  expression  for  hj(a ),  namely, 
hj(a )  =  -^(i  -  u/j)6j-u(a)rf~uJ2hv(a)hu_v(a)  (5.12) 

ti — 0  v=0 

for  j  =  1,2, . . .,  where  0u(a)  :=  0  for  u  >  2 q.  Note  that  h0(a)  is  differentiable  since 
h0(a)  =  1  for  all  a  £  A(p).  By  induction  on  j,  we  can  show  from  (5.12)  that  hj(a ) 
is  also  differentiable  for  all  j  >  1  due  to  the  differentiability  of  {#„(«)}  when  9(a)  is 
given  by  (5.2)  or  (5.6).  On  the  other  hand,  let  the  prime  '  denote  the  differentiation 


90 


with  respective  any  component  of  a.  Then  we  have  H'(z-,a)  —  —  {H(z;a.)}2P'(z-,a). 
Clearly,  H'(z-,  ct)  is  analytic  in  \z\  <  (1  -  6)-1,  just  like  H(z\  a).  Moreover,  when  0(a) 
is  given  by  (5.2)  or  (5.6),  the  derivatives  of  0j(a)  are  constants,  and  hence  \P'(z;  a)| 
is  bounded  by  a  constant  c  for  any  \z\  =  p~l  and  a  6  A(g).  Therefore,  according  to 
(5.10),  we  obtain 

max  \H'(z;a)\  <  cqq4<!.  (5.13) 

p|=p-> 

On  the  other  hand,  since  H'(z;a)  is  analytic  in  \z\  <  (1  —  <$)-1,  it  admits  the  Taylor 
series  expansion  H'(z\ot)  —  Y2T=o  9j(a) zi  with  gj(ac)  :=  // j ( 0 ;  ot)/j\  for  j  >  0,  where 
is  the  jth  derivative  of  H'(z\a )  with  respect  to  z.  Since  the  order  of  differ¬ 
entiation  with  respect  to  z  and  to  a  is  exchangeable  in  differentiating  H(z]  a),  this 
result,  in  connection  with  (5.11),  yields  fc' (e*)  =  gj(a)  and  hence 

H'(z;ot)  =  f2hj(a)zj 

j= o 

for  a  €  A(rf).  Using  (5.13)  and  Cauchy’s  inequality  for  H-(z;a ),  we  obtain  |h)(a)|  < 
CQ^fP  for  all  j  >  0  and  a  €  A(rj).  Therefore,  the  AR  filter  satisfies  the  condition 
(A4).  The  theorem  is  thus  proved.  <) 

In  order  to  effectively  control  the  bandwidth  of  the  AR  filter  by  the  parameter  g, 
we  may  consider  another  parameter  space  Ao(g)  defined  by 

A0(v)  :=  {a  :  0(ct)  €  0O)  (5-14) 

where  0O  is  a  subset  of  0(77),  consisting  of  6  £  0(77)  for  which  the  zeros  of  the 
corresponding  polynomial  @j  z~^  occur  on  the  unit  circle.  For  this  parameter 
space,  it  is  readily  shown  that  the  poles  of  the  AR  filter  are  constrained  to  be  on  the 
circle  \z\  —  77,  so  that  the  parameter  77  has  a  full  control  of  the  filter’s  bandwidth  for 
all  a  €  Ao(g).  Moreover,  since  Ao(g)  is  a  subset  of  *4(77),  the  AR  filter  still  satisfies 
the  regularity  conditions  (A3)  and  (A4)  when  77  <  1. 


91 


In  practice,  0(am )  does  not  always  appear  inside  0O  when  6tm  in  (4.8)  is  obtained 
from  a  finite  data  record.  In  case  it  falls  outside,  we  may  project  it  back  into  0O 
by  the  following  procedure.  Suppose  that  for  a  given  at  the  zeros  of  the  polynomial 
J2j~o  fy(a)  z~*  are  °f  the  form  pk  exp(±iA*),  ( k  =  1  ,...,g),  for  some  p*  >  0  and 
0  <  Ai  <  •  •  •  <  Xq  <  7T.  Then  the  projection  of  0(a),  denoted  by  6  :=  [0t,  •  •  ■  ,8q]T, 
can  be  obtained  from  the  coefficients  of  the  polynomial 

2q  q 

j= 0  k  =  l 

By  this  projection,  the  poles  of  the  AR  filter  are  guaranteed  to  be  restricted  on  the 
circle  \z\  =  r?  so  that  the  AR  filtering  is  always  stable  as  long  as  p  <  1. 

5.2  Statistical  Properties 

For  the  properties  in  Section  4.2  to  hold,  the  condition  (Al)  is  the  only  thing  left  for 
verification.  It  is  readily  shown  that  the  AR  filter  satisfies  this  condition,  since  its 
transfer  function 

B(u;a)= 

is  nonzero  for  all  a  €  Ao(v)-  In  conclusion,  the  AR  filter,  defined  by  (5.1)  and  (5.6) 
with  rj  <  1,  satisfies  (A1)-(A4).  Consequently,  the  corresponding  PF  estimator  &, 
possesses  the  statistical  properties  in  Section  4.2,  regarding  the  existence,  convergence, 
strong  consistency,  and  asymptotic  normality,  as  can  be  summarized  as  follows. 

Theorem  5.2  Suppose  that  {e4 }  is  white  with  finite  fourth  moment.  Then,  for  the  AR 
filter  defined  by  (5.1)  and  (5.6)  with  rj  <  1,  the  results  in  Theorem  4.2,  Theorem  4.3, 
and  Theorem  4.4  are  valid,  provided  that  at*  €  Ao(r]). 

As  mentioned  before,  the  parameter  rj  plays  the  role  of  controlling  the  bandwidth 
of  the  AR  filter.  Indeed,  the  closer  tj  is  to  1,  the  narrower  is  the  bandwidth  of  the  AR 


92 


filter.  Since  the  sinusoidal  signal  under  investigation  concentrates  only  on  extremely 
narrow  bands  (spikes)  in  the  frequency  domain,  it  is  clear  that  in  order  to  enhance  the 
signal  by  the  AR  filter,  g  should  be  chosen  as  close  to  1  as  possible.  From  another  point 
of  view,  the  parameter  rj  also  determines  the  asymptotic  behavior  of  the  associated 
PF  estimator  in  terms  of  its  covariance  matrix.  As  a  matter  of  fact,  the  asymptotic 
covariance  matrix  Va(r/)  of  the  PF  estimator  can  be  made  arbitrarily  small  if  ?/  is 
chosen  arbitrarily  close  to  1.  More  precisely,  we  have  the  following  theorem  concerning 
the  limit  'Va{g)  as  g  — ►  1. 

Theorem  5.3  Let  'Va(g)  be  the  asymptotic  covariance  matrix  of  the  PF  estimator  a 
corresponding  to  the  AR  filter  defined  by  (5.1)  and  (5.6)  with  g  <  1.  Then  Va(?7)  — 
0((1  —  g)3)  as  T)  — >  1,  where  Va(g)  is  given  by  (4.23). 

Proof.  To  prove  this  theorem,  we  first  need  to  rewrite  u?,j(a*)  defined  by  (4.18)  into 
a  more  suitable  form  with  the  help  of  the  spectral  representation 

reT(a*)  =  £-  [*  \H (u;  a*)\2  eiTU  du.  (5.15) 

Z7T  «/— 7r 

To  this  end,  let  us  define  dj  rfOj(a*)/aj.  Then,  from  (5.6),  we  obtain 

dj  =  W  +  V2q+i)/(rf  +  Ti2q-i)  (j  =  0, 1, ... ,  2 q). 

With  this  notation,  we  can  rewrite  (5.3)  as 
2  q 

=  0  (r  =  1,2,...). 

j= 0 

This,  in  connection  with  (5.15),  gives 

Sr  ~  >■;_,,(£>•)  =  EO-d,)..,  <_;(<»•) 

j=0  j= 0 

a*)|2  |g(l  -  dj)aj  e-’>|  du. 


93 


Introducing  the  polynomials 

D(z)  :=  ]T(1  -  dj)ajZ29~:>  and  P(z )  :=  ^  ^(a*)  rfzj, 
j=o  i= o 

we  can  write  ST  as  a  Cauchy  integral  of  the  form 


ST 


°L  I  ”(*) 

2 iri  J\z\=i  P(z)  Q(z) 


1 


dz 


(r  =  1,2,...) 


where  Q(2)  :=  z2qP(z~1).  Note  that  a*  G  Ao(t])  implies  9(a.*)  G  Oo-  This,  in  turn, 
guarantees  that  the  2 q  zeros  of  P(z~l)  appear  on  the  circle  \z\  —  77  and  can  be  expressed 
as  vk  :=  7/exp (zA* ),  (/:  =  q),  with  the  A*  satisfying  0  <  Ai  <  •  •  •  <  A,  <  7r  and 

A*,  :=  —  A2?_jt+i,  (k  =  q  +  1, . . . ,  2 q).  As  a  result,  we  can  write 

2  q  2q 

P(z)  =  T[(l  ~  "kz)  and  Q(z)  =  J\(z  -  vk) 

fc=l  k—l 


where  vk  is  the  complex  conjugate  of  vk .  It  follows  from  the  residue  theorem  of  complex 
analysis  (Markushevich,  1977)  that 


2  q 


Sr  =  £ 


D{uh) 


n 


T-  1 


tl  p(l'k)Q'(i'k) 

Using  this  formula,  W{j(a*)  in  (4.18)  can  be  written  as 


(r  =  1,2,. . .). 


Wij(<*  )  —  4  y  )  5V  4-.,'  ST+j 

T  —  0 

4^4  V'  D(uk)D{iykl)  vt-'pjT1 

£  -PC^fcO  Q'(^)  1  -  vhvh, 


(5.16) 


where  the  overline  denotes  the  complex  conjugation. 

With  the  expression  in  (5.16),  the  behavior  of  W (a*)  as  rj  tends  to  1  becomes 
easer  to  investigate.  In  fact,  we  first  note  that  vk  — ►  zk  as  rj  — >  1,  where  the  zk  are 
zeros  of  the  AR  polynomial  A(z~l)  =  X)j=o  aj in  (2-1).  Moreover,  d0  =  1  and 
(1  —  ??2)-1  (1  —  dj )  — ►  j  / 2  for  j  =  1, . .  ,,2q  as  Tj  -»  1.  Therefore,  we  obtain 


(l-V)-1!^*) 


2q 

£  \iai 

j- 1 


1  V*?"1 

2  ^ 


A'(z;'). 


(5.17) 


94 


Since  A(z)  =  f](l  —  zkz)  and  zk  1  =  zk,  it  is  readily  shown  that 


2  q  2  q 

(1  -  r/2)-1  P(vk)  =  n  (1  -  Vk.vk)  -*  JJ  (1  -  zk.zk)  =  -zkA'(zk) 


k'= 1 
k'?k 

k'= 1 
k'jtk 

and 

2  i 

2  q 

2  q 

Q\vk)  = 

fl(^- 

1 

e 

t 

zk')  =  lid-  Zk'Zk )  =  ~Zkq  A'(Zk)- 

k'  =  1 

it'=i 

k'—l 

k'^k 

k'jtk 

k'^k 

Finally,  (1 

-  Vk&k')  —  1  for  k 

'  —  k,  and  (1  —  q2)/(l  —  i/*fv)  — *  0  as  77  — > 

k'  ^  k.  Substituting  all  these  limits  in  (5.16)  yields 

(1  -  rf)  Wij(a*)  -*•  a4c  £  4-1**"1  (*>  j  =  1, . . . ,  2q  -  1) 

as  q  — ►  1.  In  matrix  notation,  we  can  write 

(1  —  q2)  QTW(a*)  Q  —>  4af  S0  with  £0  :=  QtSD0S"Q  (5.18) 

where  S  is  the  Vandermonde  matrix  in  (2.21)  and  D0  is  the  2  (/-by- 2 17  diagonal  matrix 
defined  by  D0  :=  ±  diag{|A'(^i)|~2, . . \A'{z2q)\-2}. 

To  complete  the  proof,  it  suffices  to  consider  the  behavior  of  R^a*)  as  q  tends  to 
1.  For  this  purpose,  we  notice  that  \H(uk',a*)\  =  l/\P{zll)\  and 

pizkl)  =  J20i(a*)rljzk3  ='52diaizk3  =  ~zk2,D(zk)- 

j— 0  j=0 

Therefore,  it  follows  from  (5.17)  that  (1  -  q2)2  \H(uk\ a*)|2  — *•  il\A\zk)\2.  This, 
together  with  the  decomposition  of  Rs(a*)  in  Remark  2.1,  implies  that 

(1  -  q2)2Rx(a*)  -»■  8r*£  with  £  :=  QTSDS*Q  (5.19) 

where  D  :=  \  diag{(72/|A'(^i)|2, . .  .  ,a2ql\A'{z2q)\2}  with  a\  :=  /3*/Ej=i$  for  k  = 
and  <r|  :=  alq_k+1  for  k  =  q  +  1, . . . , 2q.  Collecting  (5.18)  and  (5.19),  we 
finally  obtain 

Jim  =  jbs-'E.S-1  (5.20) 


95 


where  Va(v)  is  the  covariance  matrix  given  by  (4.23)  and  7  :=  rj / =  r%/a]  is  the 
signal-to-noise  ratio  of  the  data.  The  theorem  is  thus  proved.  <) 

Remark  5.1  In  the  special  case  where  the  q  sinusoids  have  the  same  power,  that 
is,  Ph  =  P\  for  all  k,  the  limit  in  (5.20)  reduces  to 


because  D  =  q_1D0  in  the  case. 

Remark  5.2  It  is  interesting  to  recognize,  by  comparing  (5.18)  with  the  decom¬ 
position  of  Rx  in  Remark  2.1,  that  E0  is  identical  to  the  autocovariance  matrix  of  a 
sinusoidal  signal  with  the  same  q  frequencies  as  {x(},  and  with  the  amplitude  corre¬ 
sponding  to  uk  equal  to  l/\A\zk)\.  Similarly,  E  in  (5.19)  has  the  same  structure  as  S0, 
except  that  the  amplitude  associated  with  uk  is  replaced  by  ok  l\A'(zk)\.  Therefore, 
these  matrices  can  be  rewritten  as 


s°=  E 


cos(cjt(t  -  j)) 
2  \A'{zk)\2 


v  _  [y  cosM*  ~  i)) 

Lfe  2  \A'(zk)\2 


where  i,  j  =  1, . . .,  2q  —  1. 


5.3  Accuracy  of  the  PF  Estimator 

As  guaranteed  by  Theorem  4.4,  the  PF  method  produces  an  estimator  a  so  that 
y/n(ct  —  a*)  is  asymptotically  normally  distributed  and  thus  its  estimation  accuracy 
is  of  order  0(n~1/r2).  However,  for  the  AR  filter  (5.1),  it  has  been  shown  that  the 
asymptotic  covariance  matrix  of  the  corresponding  PF  estimator  can  be  made  arbi¬ 
trarily  small  as  tj  tends  to  1.  This  indicates  that  a  higher  order  accuracy  could  be 
obtained  with  1.  Indeed,  as  discussed  in  Truong- Van  (1990),  Quinn  and  Fernandes 
(1991),  and  Li,  Kedem,  and  Yakowitz  (1992)  for  the  case  of  a  single  sinusoid  ( q  =  1), 
the  PF  estimator  with  the  AR  filter  is  able  to  achieve  the  same  accuracy  of  order 
0(n~3/2)  as  the  nonlinear  least  squares  (NLS)  approach  (Hannan,  1973;  Stoica  and 


96 


Nehorai,  1989;  Walker,  1971)  in  the  limiting  case  of  rj  =  1.  We  shall  return  to  this 
point  in  the  next  section  where  the  special  case  of  q  =  1  is  considered  in  detail. 

For  the  case  of  multiple  sinusoids,  the  fixed-point  iteration  in  (4.8)  with  rj  —  1  can 
be  regarded  as  an  algorithm  that  approximately  calculates  the  NLS  estimator  in  an 
iterative  fashion.  In  fact,  as  we  have  seen  in  Chapter  1,  the  NLS  estimator  minimizes 
J'n  in  (1.13).  Notice  that  I  —  G(GTG)-1GT  in  (1.13)  is  a  projection  operator  that 
projects  an  n-vector  onto  the  orthogonal  complement  of  the  2 (/-dimensional  column- 
space  of  G.  On  the  other  hand,  if  we  let  be  the  AR  coefficients  determined  by 
(2.1)  for  any  given  Uk  uk,  and  denote  by  A  the  corresponding  n-by-(n  —  2 q)  matrix 
of  the  structure  (4.22),  it  is  easy  to  verify  that  ATG  =  0.  This  implies  that  the 
n  —  2q  linearly-independent  columns  of  A  are  orthogonal  to  the  columns  of  G  and 
thus  span  the  (n  —  2 (^-dimensional  orthogonal  complement  of  the  column-space  of  G. 
As  a  consequence,  we  obtain 

I  -  G  (GtG)-xGt  =  A  (At A)"1  AT 

and  hence  J'n  =  yTA  (ATA)_1  ATy.  Let  e(a)  :=  [e2g+1(a),  •  •  •  ,e„(a)]T  where 

2, 

e*(»)  :=  (5-22) 

j= o 

Simple  algebra  shows  that  ATy  =  e(a)  where  y  =  ,  ■  •  •  ,  yn]T .  Therefore,  J'n  can  be 

rewritten  as 

W)  =  eT(a)  (ATA)-1e(a).  (5.23) 

The  NLS  method  is  thus  reduced  to  the  problem  of  minimizing  J^(a)  in  (5.23)  with 
respect  to  a.  To  compute  the  NLS  estimator,  an  iterative  procedure  can  be  employed 
in  accordance  with  the  suggestions  of  Bresler  and  Macovski  (1986).  Indeed,  for  any 
estimate  am  of  the  AR  parameter  a,  a  new  estimate  can  be  obtained  by  minimizing 
the  criterion  J^m\ a)  :=  eT(a)  (A^Am)_1e(a)  with  respect  to  a,  where  Am  is  the 
matrix  A  with  6cm  in  place  of  a. 


97 


Let  a  be  within  a  tiny  neighborhood  of  the  true  AR  parameter  a.  It  is  easy  to  see 
that  within  such  a  neighborhood  e*(a)  can  be  approximated  by 

2g  2  q  2  q 

e<(a)  =  xt-j  +  £  et-i  ~  £  %  • 

j=o  j— o  j'=o 

Since  {c,}  is  white,  it  can  be  shown  from  this  approximation  that  the  covariance 
matrix  of  e(a)  is  approximately  equal  to  a\ AT  A.  Therefore,  when  am  and  a  are  both 
very  close  to  a,  the  vector  em(a)  :=  (A^  Am)-1/2e(a)  can  be  regarded  as  a  whitened 
version  of  e(a).  Resorting  to  the  frequency- domain  interpretation,  this  whitening 
procedure  can  be  approximately  performed  by  applying  to  et  :=  e*(a)  an  AR  filter  of 
the  form  (5.1)  with  r\  m  1  and  a  —  am.  Let  {et(o!m)}  be  the  output  of  the  filter,  then 
em(a)  «  [e2(,+i(am),--- ,e„(dm)]T  and  hence  J(m\ a)  =  ||em(a)||2  s  E{e<(“m)}2- 
On  the  other  hand,  by  interchanging  the  order  of  the  AR  filtering  on  ef(a)  and  the 
FIR  filtering  on  yt  defined  by  (5.22),  we  obtain  et(dm)  «  YfjLo&j  Thus, 

minimizing  T^m)(a)  is  approximately  equivalent  to  minimizing  h  yt-ji&m)}2- 

The  latter  yields  the  PF  estimator  dm+1  given  by  (4.8). 

An  advantage  of  the  PF  method  over  the  NLS  method  lies  in  its  computational 
simplicity  inherited  from  the  explicit  least  squares  solution  of  a(c*).  In  fact,  a  direct 
calculation  of  J'n(k)  in  (5.23)  requires  the  inversion  of  an  (n  -  2q)-by-(n  -  2 q)  matrix 
AtA,  as  compared  to  the  inversion  of  a  q-by-q  matrix  QrYT(a)  Y(«)  Q  plus  a  linear 
recursive  filtering  for  the  computation  of  a(ct)  in  (4.10).  The  computational  complexity 
is  clearly  0(n3)  versus  O(n). 

Another  advantage  of  the  PF  method  comes  from  its  less  stringent  requirement 
of  the  initial  estimates.  As  pointed  out  by  many  researchers,  the  NLS  approach  and 
periodogram  analysis  (Rice  and  Rosenblatt,  1988;  Stoica,  Moses,  Friedlander,  and 
Soderstrom,  1989;  see  also  Chapter  1),  as  well  as  the  PF  estimator  in  the  limiting 
case  of  q  =  1  (Truong- Van,  1990;  Quinn  and  Fernandes,  1991;  see  also  the  next 
section),  require  an  initial  estimate  of  accuracy  o{n~l )  in  order  to  obtain  the  optimal 


98 


solution  by  iterative  procedures.  On  the  other  hand,  with  rj  <  1,  an  initial  estimate 
of  accuracy  0(1)  is  sufficient  for  the  fixed-point  iteration  in  (4.8)  to  converge  to  the 
corresponding  PF  estimator,  as  indicated  by  Theorem  4.2.  This,  however,  is  not  the 
end  of  the  story.  In  fact,  thanks  to  the  flexibility  in  the  choice  of  rj,  the  estimation 
accuracy  of  the  PF  estimator  can  be  further  improved  upon  increasing  the  value  of 
r)  toward  1  as  more  reliable  estimates  become  available  to  initiate  the  iteration.  For 
this  purpose,  we  consider  an  increasing  sequence  of  t?  such  that  0  <  rj1  <  rj2  <  ■  ■  ■  ►  1 
(or  0  <  rjx  <  t)2  <  •  •  •  <  t)k  —  !)•  To  obtain  the  PF  estimator  a(r]k),  the  fixed-point 
iteration  (4.8)  is  initiated  not  with  an  arbitrary  a0  but  with  the  previously  obtained  PF 
estimator  d(%_ i).  In  so  doing  for  k  —  1,2, . . .,  a  sequence  of  PF  estimators  {a(^)}  is 
produced  —  each  estimator  in  the  sequence  serving  as  the  initial  value  of  its  successor. 
As  k  grows,  the  accuracy  of  6l(t) k)  approaches  that  of  the  NLS  estimator,  but  the 
whole  procedure,  with  r)i  taking  on  a  relatively  small  value,  is  able  to  accommodate 
poor  initial  guesses  employed  to  yield  6c(t)x).  Another  possible  way  of  improving  the 
estimation  accuracy  is  to  increase  rj  after  each  iteration  instead  of  carrying  on  the 
iteration  until  convergence  (Dragosevic  and  Stankovic,  1989;  Kedem  and  Yakowitz, 
1991).  This  strategy  simplifies  the  computation  to  some  extent  but  may  result  in 
converging  to  a  false  location  if  r)  grows  too  fast. 

In  the  most  interesting  case  where  the  frequencies  are  closely  spaced,  however,  one 
should  be  very  cautious  when  increasing  r].  Simulations  show  that  the  bias  of  the  PF 
estimator  based  on  a  finite  data  record  increases  as  r\  approaches  1,  while  the  variance 
decreases.  For  closely-spaced  frequencies,  the  bias  eventually  dominates  the  variance, 
and  hence  an  appropriate  value  of  rj  <  1  should  be  considered  to  balance  the  trade-off 
between  the  bias  and  variance  in  minimizing  the  mean-squared  error. 


99 


5.4  Special  Cases  of  One  and  Two  Sinusoids 


In  the  preceding  sections,  we  have  discussed  the  general  properties  of  the  PF  method 
with  the  AR  filter.  Now  let  us  concentrate  on  two  special  cases  where  the  signal 
consists  of  a  single  sinusoid  and  two  sinusoids,  respectively.  For  these  examples,  we 
shall  provide  detailed  analysis  in  order  to  gain  some  further  insight  into  the  PF  method 
regarding  its  parameter  space  and  accuracy. 

5.4.1  A  Single  Sinusoid  in  White  Noise 

In  the  case  of  q  =  1,  the  AR  filter  (5.1)  becomes  a  second-order  filter  of  the  form 

yt(a)  +  6(a)  rjy^^a)  +  rj2yt_2(a)  =  yt  (5.24) 

and  the  parametrization  (5.6)  reduces  to  the  simple  equation 

6(a)  =  ^~  a.  (5.25) 

2  V 

Since  the  zeros  of  the  polynomial  1  +  6z~x  +  z~ 2  are  \(-6  ±  y/62  -  4 ),  it  is  necessary 
that  \6\  <  2  in  order  that  the  zeros  are  pure  complex.  In  this  case,  the  magnitude  of 
the  conjugate  pair  of  zeros  is  equal  to  1,  so  that  0O  =  {6  :  \6\  <  2}  and 

A0(r])  =  {a:  M  <  4??/(l  +  if)}. 

Moreover,  since  ai  =  —2  cos  a;!,  the  results  in  Theorem  5.2  holds  as  long  as  r]  is  chosen 
such  that  rj  <  1  and 

—277/(1  +  rj2)  <  cosuq  <  277/(1  +  rj2). 

Obviously,  this  can  be  achieved  with  a  sufficiently  large  77  since  |  cosuql  <  1  for  uq  € 
(0, 7r).  Given  a  €  *4o(»?),  since  6(a)  €  0O,  we  can  write  6(a)  —  —2 cos u(a)  for 
some  u>(a)  £  (0, 7r).  The  poles  of  the  AR  filter  are  therefore  v  =  r/exp(ia;(a))  and 


100 


v  —  r/exp(—iu>(a)).  In  the  cases  where  {c4}  is  white  noise,  a  formula  given  by  He  and 
Kedem  (1990,  Lemma  2.1,  Case  iii)  leads  to 


Pl{a)  =  TT^  cosa;(ft)  =  "  if e(a) 

where  p\(a)  is  the  first-order  autocorrelation  of  the  filtered  noise  {o(a)}.  On  the 
other  hand,  a  simple  calculation  shows  that  -R71(a)  r£(a)  =  -2 p[(a).  Therefore, 
with  9(a)  given  by  (5.25),  we  obtain  -R“1(a)r£(a)  =  a  for  all  a  €  A0(rj).  This 
verifies  again  that  the  AR  filter  possesses  the  parametrization  property  (A2)  if  it  is 
parametrized  according  to  (5.25). 

To  compute  the  asymptotic  variance  of  the  PF  estimator  d,  we  note  that  with 
q  =  1  the  matrix  Va(g)  in  (4.23)  reduces  to  the  simple  form 


= 


u>n(a*) 


(rj(a*))2 

where  a*  :=  a\  —  -2cosu>i.  Notice  that  we  drop  the  argument  77  in  U7n(a*)  and  r^a*) 
for  notational  brevity.  The  following  corollary  claims  that  the  expression  (5.21)  holds 
for  any  77  without  taking  the  limit. 


Corollary  5.1  Let  o2a(r])  and  <7  2  (77)  be  the  asymptotic  variances  of  a  and  the  corre¬ 
sponding  using  the  AR  filter  (5.1).  Then,  for  any  77  <  1, 

„2'3 


= f  (bff) sin2wi  and  a2Jv)  =  f  ibff)  (5‘26) 


where  7  :=  ~/32 fa2  is  the  signal-to-noise  ratio  of  {yt}. 


PROOF.  For  q  =  1,  we  can  write  U7n(a*)  in  (5.16)  as 


«bi(a*)  =  4<t£ 


2\D(v)\2 


\P{v)\2\ Wl2(i  -  M2) 


2(D(u)f 


(P(v)Y(Q'(v))2(l-o2) 


}) 


where  v  :=  77exp(w<;)  with  u  u>(a*).  Since  #(a*)  =  — 2cosw  and  a*  =  a1?  it  follows 
from  (5.25)  that 


1  +  ?72  1  +  rfi 

cos  (j  = - - - a  1  =  — -  cos  w, . 

477  277 


(5.27) 


101 


Moreover,  by  definition,  D(z )  =  (1  +  f  fli-z)  (1  —  rf).  Therefore,  we  have 


D{u)  =  (1  +  \axv)  (1  -  rf) 

=  (1  -  rf{  1  +  rf)~\e^  +  e-'w)  e'“)  (1  -  rf) 

=  (i-ijv^hi+^-hi-t/2) 

=  (l-^a+^i-T?2) 

where  the  second  equality  is  due  to  the  relation  between  2  cos u  =  exp(iu)  -f  exp(-ioj) 
and  a,i  given  by  (5.27).  Since  P(z)  =  (1  —  vz )  (1  —  vz),  it  is  readily  shown  that 

p(*,)  =  (i  -  ^)  (i -y ). 

In  addition,  since  Q'(z )  =  (z  —  v)  -f  (z  —  D),  a  simple  calculation  gives 

(Q\v)f  =  (v  -  vf  = -Arf  sin2  a; 

and  hence  | Q'(v)\2  =  4?/2sin2u;.  Combining  all  these  results,  we  obtain 

ro“(a,)  °  MTHW  -  *)*»»  (X  -  *  {££})  •  (5-28) 

To  proceed  with  the  calculation,  we  note  that 

|1  —  i^2j2  =  1  -  2i)2  cos(2w)  +  T)4 

=  (1  +  rj2)2  -  4 rf  cos2  u> 

=  (1  +  772)2  sin2u;i 


where  the  last  equality  is  obtained  according  to  (5.27).  Similarly,  we  have 
3?{1  -  S'2}  =  1  —  t?2cos(2cj)  =  (1  +  rf){  1  -  |(1  +  r]2)  cos2^). 
Using  these  expressions,  we  can  write 

‘-■£3} 


1  —  t~ - ?rr»{l-P2} 


|1-I/2|2 

2ij2(l  -  (4?/2)_1(l  +  ?72)2cos2uq) 
(1  +  t f)  sin2  u>i 
2rf  sin2  uq 
(1  +  rf)  sin2  u>i ' 


102 


Substituting  this  result  in  (5.28)  yields 

/  «x  _ 4 of _ 

11  (1  +  ?72)3(1  —  772)sin2u71 

Finally,  with  the  aid  of  (5.25),  a  straightforward  calculation  shows  that 

I  ^  11  ^  |1  +  6(a*)  ge~iWl  +  772e-*'2“i|2  (1  -  tj2)2  sin2^  ' 

Since  rj(a*)  =  H(ux-,a*)\2,  the  expression  of  <72(j7)  in  (5.26)  follows  immediately 

upon  noting  that  cr2(r/)  =  Wii(a*)/{rJ(tt''')}2.  Moreover,  using  Slutsky’s  theorem 
(Brockwell  and  Davis,  1987,  Proposition  6.4.1),  we  obtain  o2(r])  =  \o2(r))/ sin2^ 
because  ux  =  arccos(-4a1)  and  hence  du>x/dax  =  |(1  -  |a2)-1^2  =  (2sina>i)_1.  <£> 

Remark  5.3  For  <7  =  1,  the  matrix  S0  in  (5.21)  reduces  to  S0  =  \  \  A'(zx)\~2.  Since 
A(z )  =  (1  —  zxz)(l  —  zxz),  a  simple  calculation  shows  that  |A,(^r1)|2  =  \zx  —  zx\2  = 
dsin2^.  Therefore,  S0  =  (8  sin2  o'!)-1.  This,  in  connection  with  (5.26),  verifies  our 
claim  that  (5.21)  holds  for  q  =  1  without  taking  the  limit. 

To  end  our  discussion  on  the  single  sinusoid  case,  let  us  consider  the  limiting 
situation  when  77  =  1.  As  aforementioned,  the  PF  estimator  with  <7  =  1  and  77  = 
1  coincides  with  the  procedure  discussed  by  Truong- Van  (1990)  and  by  Quinn  and 
Fernandes  (1991).  The  following  theorem  summarizes  their  results.3 

Theorem  5.4  Let  q  =  1  and  77  =  1.  Then,  for  large  n,  the  PF  estimator  a  exists 
almost  surely  as  the  unique  fixed-point  of  a(a)  in  S(a*)  :=  {a  :  \a  —  a*\  <  cn~ *}  for 
some  fixed  1  <  S  <  3/2  and  c  >  0.  The  iteration  in  (4.8)  converges  to  a  almost  surely 
for  large  n  if  a0  €  S(a )  :=  {a  :  |a  —  a|  <  cxn~6}  for  some  cx  >  0.  Furthermore,  as 
n  — >  00,  n6(ux  -  u>x)  a-i'  0  for  any  6  <  3/2  and  n3l2(ux  —  uix)  N(0, 12/-y ) . 

Proof.  We  briefly  outline  the  proof  of  existence  and  convergence.  The  remaining 
proof  can  be  found  in  Truong- Van  (1990),  or  in  Quinn  and  Fernandes  (1991).  For  the 

3In  Quinn  and  Fernandes  (1991),  convergence  was  proved  only  for  the  alternative  procedure  of  the 

form  =  2a(dm_1)  — 


103 


existence  of  d,  let  us  consider  5o(o*)  :=  {<*  :  |a  —  a*|  <  c0n  for  some  c0  >  0.  It 
can  be  shown  that  (Quinn  and  Fernandes,  1991)  that 

«(«i)  -  0(02)  =  («i  -  aa)  1 1  +  Oa,. 

uniformly  in  07,  a2  €  5o(o*)  when  n  is  sufficiently  large.  Moreover,  it  can  be  shown 
that  (Quinn  and  Fernandes,  1991) 

a(a*)  -a*  =  Oa,s .  ^»-1 

for  large  n.  Therefore,  with  1/2  <  c  <  1  and  c  :=  c0/2,  we  obtain 

|a(oi)  -  d(a2)|  <  ?|«i  -  ol2\  and  |a(a*)  -  a*|  <  (1  -  c) cn~s 

almost  surely  and  uniformly  in  oq,  a2  €  S(a*).  This  implies  that  a(a)  is  a  contractive 
mapping  in  5(«*)  and  hence  the  unique  fixed-point  a  exists  in  S(a*)  almost  surely, 
provided  that  n  is  sufficiently  large.  Following  the  proof  of  Theorem  4.2,  we  can  show 
that  the  iteration  (4.8)  converges  to  a  with  any  initial  guess  taken  in  5(d)  if  c:  is 
chosen  such  that  0  <  cx  <  c0/2.  § 

Remark  5-4  Theorem  5.4  remains  valid  if  the  noise  is  colored.  In  this  case,  we 
would  have  n3/,2(d>i  —  uq)  N(0,12/7!)  where  jx  is  the  signal-to-noise  ratio  given 
in  Theorem  1.1.  This  implies  that  the  PF  method  with  rj  —  1  achieves  the  same 
estimation  accuracy  of  periodogram  analysis  and  nonlinear  least  squares.  As  we  can 
see,  the  asymptotic  variance  depends  solely  on  the  noise  spectrum  at  the  frequency  of 
the  sinusoid. 

Remark  5.5  Theorem  5.4  tells  us  that  only  if  the  initial  guess  falls  within  a  distance 
of  o(n_1)  to  d,  and  hence  to  the  true  AR  parameter  a* ,  is  the  iteration  (4.8)  guaranteed 
to  converge  to  the  fixed-point  d.  When  the  initial  accuracy  is  poorer  than  o(n-1),  the 
fixed-point  iteration  may  converge  to  a  false  location,  if  it  converges  at  all. 

From  these  remarks,  we  observe  again  the  phenomenon  of  high  estimation  accuracy 
0(n~3/2)  versus  stringent  initial  requirement  o(n-1),  just  like  what  we  saw  earlier  in 


104 


Chapter  1  when  we  discussed  the  properties  of  periodogram  analysis.  This  connection 
is  by  no  mean  a  coincidence.  As  a  matter  of  fact,  the  PF  method  using  the  AR  filter 
with  rj  =  1  has  a  close  relation  with  periodogram  analysis.  Indeed,  when  r]  =  1,  the 
parametrization  (5.25)  reduces  to  the  trivial  equation  6(a)  =  a  and  the  parameter 
space  to  A0(l)  =  0q.  Let  a  =  0(a)  =  — 2cosu>.  This  equation  defines  a  one-to-one 
correspondence  between  a  £  .4o(l)  and  u>  £  (0, ir),  and  the  AR  filter  (5.24)  becomes 

yt(a)  -  2  cos uyt-^a)  +  yt-i(a)  =  yt- 

Assuming  zero  initial  values,  the  filtered  data  yx(a), . . . ,  yn(a)  can  be  explicitly  written 
as  (Truong- Van,  1990) 

t-i 

yt(a)  =  (smLj)'1J2s'm(U  +  1)ljJ)yt-3  =  (5.29) 

j=o 

Recall  (see  Chapter  4)  that  the  least  squares  mapping  a(a)  admits  the  representation 
(4.12).  Therefore,  if  we  denote  by  p(d)  the  first-order  sample  autocorrelation  of  the 
filtered  data  {j/t(a)},  namely, 

p(^)  =  \  1(«)  {&(«)  +  &- 2(a)}  /x^,-i(a) 

Z  t- 3  /  (=3 

where 'd  :=  a/2  =  cosw,  then  the  PF  estimator  can  be  obtained  from  the  fixed-point 
0  of  the  mapping  p(0)  according  to  a  =  —  2fl  and  ux  =  arccos  0. 

From  (5.29),  we  clearly  observe  the  resemblance  between  the  output  sequence 
{yt(a)}  of  the  AR  filter  and  the  output  sequence  {yt(u>)}  of  the  complex  exponen¬ 
tial  filter  in  (1.8).  In  fact,  it  is  easy  to  verify  that 

yt(a)  =  (sinu>)-1S{^(u;)}  (t  =  1 ,...,«) 

where  9{-}  stands  for  the  imaginary  part  of  a  complex  number.  With  the  help  of  this 
identity,  the  relation  between  periodogram  analysis  and  the  PF  method  with  r]  —  1 
becomes  quite  evident:  the  former  seeks  to  find  u  that  maximizes  the  power  \yn(u)\2 
of  the  last  output  ( t  =  n)  of  the  filter  in  (1.8),  while  the  latter  looks  for  w  such  that 


105 


i?  =  cosu>  is  a  fixed-point  of  the  first-order  sample  autocorrelation  p(9)  based  on  the 
output  sequence  of  the  same  filter. 

The  impact  of  this  observation  is  twofold.  On  the  one  hand,  it  explains  why  the 
PF  method  with  77  =  1  is  able  to  achieve  the  same  accuracy  as  periodogram  analysis; 
while  on  the  other,  it  helps  to  understand  why  accurate  initial  guesses  are  required  for 
convergence.  All  of  these  questions  can  be  answered  by  the  behavior  of  the  squared 
gain  function  G„(  10)  of  the  complex  exponential  filter  in  (1.8)  as  we  have  discussed  in 
Chapter  1. 

Because  the  complex  exponential  filter  in  (1.8)  has  an  extremely  narrow  bandwidth 
of  0(n-1),  the  PF  method  with  rj  close  or  equal  to  1  is  able  to  operate  locally  in  the 
frequency  domain  with  little  or  no  interference  from  other  frequency  components  far 
away  from  the  center  of  the  filter.  Therefore,  in  the  case  of  multiple  sinusoids  where 
the  frequencies  are  well  separated,  it  is  possible  to  deal  with  one  sinusoidal  frequency 
at  a  time,  using  the  PF  method  for  the  single  sinusoid  case  (Truong- Van,  1990).  In  so 
doing,  we  simply  seek  to  find  local  attractive  fixed-points4  of  the  mapping  a(a),  just  like 
we  seek  to  find  local  maxima  of  the  periodogram  in  periodogram  analysis.  However, 
also  like  periodogram  analysis,  this  method  can  not  be  applied  when  the  frequencies 
are  closer  than  0(n-1)  because  of  the  resolution  limit  of  the  filter.  To  cope  with  the 
more  difficult  situation  of  closely-spaced  frequencies,  we  must  turn  to  the  multivariate 
PF  method  in  which  the  multiple  frequencies  are  considered  simultaneously. 

5.4.2  Two  Sinusoids  in  White  Noise 

Let  us  now  consider  the  case  of  two  sinusoids  (q  —  2)  in  additive  white  noise.  In  this 
case,  0O  consists  of  all  0  =  [#i,02 ]T  with 

91  —  — 2(cos  At  +  cos  A2)  and  02  —  2(1  +  2  cos  Ai  cos  A2)  (5.30) 

4  Note  that  a  is  essentially  the  unique  (global)  fixed-point  of  a(a)  in  the  single  sinusoid  case. 


106 


for  some  0  <  A,  <  X2  <  tt.  Simple  algebra  shows  that 

iAi  =  A  ^-0,  4 -  \Jo\  —  402  +  8  ^ 

iA2  =  \  0\  —  yjti i  —  46*2  +  8  ^  . 

Therefore  0,  and  02  must  satisfy  the  inequalities 
-»i  ±  sje\  -402  +  8 


cos . 


cos , 


<  1  and  0J  -  402  +  8  >  0. 


It  turns  out  by  solving  these  inequalities  that  0O  can  be  expressed  as 


0o  =  {(0i,02):  210,1 -2  <  02  +  2}. 


(5.31) 


Moreover,  for  q  —  2,  the  parametrization  (5.6)  reduces  to 

tfi(a)"^T?ftl  and  *2(a)  =  _v~a2- 

According  to  (5.31),  the  parameter  space  Ao(v)  is  given  by 


Figure  5.1  shows  Ao(rj)  for  rj  =  0.8  together  with  0O  defined  by  (5.31). 

From  Figure  5.1,  as  well  as  (5.31)  and  (5.32),  we  observe  that  Ao(q)  is  contained 
in  0O  for  any  rj  <  1,  and  that  Ao(v)  increases  and  eventually  coincides  with  0O  as 
77  — ►  1.  Therefore,  for  any  given  uq  and  u2  satisfying  0  <  uq  <  u2  <  7r,  the  requirement 
a*  G  Ao(rj)  in  Theorem  5.2  can  always  be  met  by  choosing  rj  close  enough  to  1. 

On  the  other  hand,  for  a  given  r?  <  1,  the  requirement  a*  €  ^0(0)  imposes  a 
separation  condition  on  the  frequencies  of  the  signal.  As  a  matter  of  fact,  in  order  that 
a*  G  «40(»?),  the  frequencies  uq  and  uq  should  stay  away  from  0  and  7r,  respectively, 
and  from  each  other,  by  a  certain  amount  depending  on  t).  It  is  not  too  difficult  to 
verify  that  a  sufficient  condition  for  (5.32)  to  be  fulfilled  is  that 

(  1  +  rf  \  ,  /  2q2 

uq,  7 r  —  uq  >  arccos  I  — p=====  j  and  uq  —  uq  >  arccos  — — - 


107 


r . . . .  i _ _  .  ,  _ _ i _ , _ , _ _ _ , _ l. 

-4  -2  0  2  4 


Figure  5.1:  Parameter  space  Ao(r))  with  rj  =  0.8  in  the  case  of  two  sinusoids.  The 
dotted  lines  define  the  region  0O. 

To  provide  a  complete  picture  of  the  separation  condition,  Figure  5.2  shows  for  rj  —  0.8 
and  0.9  the  set  Qn,  which  we  refer  to  as  the  frequency  space ,  of  the  frequency  pair 
(oj1,u}2)  for  which  a*  e  A0(r]). 

Let  us  now  consider  the  characteristic  of  the  AR  filter  at  a  —  a*  which  determines 
the  asymptotic  behavior  of  the  PF  estimator,  as  indicated  by  Theorem  4.4.  Let  us  first 
look  at  the  squared  gain  function  \H(ui\ a*)|2.  In  Figure  5.3  (a),  |/f(uq  a*)|2  is  plotted 
for  q  =  0.92  as  a  function  of  the  normalized  frequency  /  :=  w/7r,  where  a*  =  a  is 
the  AR  parameter  corresponding  to  =  0.327T  and  u>2  =  0.457T.  Clearly,  the  squared 
gain  function  has  peaks  around  the  frequencies  of  the  signal,  so  that  the  AR  filter  is 
able  to  enhance  the  sinusoids.  On  the  other  hand,  the  peaks  of  \H(u;a*)\2  are  not 
exactly  located  at  the  sinusoidal  frequencies.  These  slightly  biased  peaks  are  required 
by  the  parametrization  property  (A2)  in  order  to  make  the  PF  estimator  consistent.  A 
further  illustration  of  this  point  is  presented  by  Figure  5.3  (b),  where  the  poles  of  the 
AR  filter  are  plotted  in  the  complex  plane.  It  can  be  seen  that  the  PF  method  does 
not  in  general  force  the  poles  of  the  AR  filter  to  take  the  same  angular  frequencies  as 


108 


fl 


Figure  5.2:  Frequency  space  with  rj  =  0.9  (smallest)  and  0.95  for  the  normalized 
frequency  pair  (f\,f2)  —  (u i,w2)/7r.  The  region  defined  by  dotted  lines  corresponds 
to  the  case  of  i]  =  1. 

the  signal  in  order  to  produce  a  consistent  estimator,  as  guaranteed  by  Theorem  5.2. 

Finally,  for  any  given  G  =  [9U02]T  (not  necessarily  in  0O),  let  (i  =  ps  expfz'A,)  and 
C2  =  P'2  exp(i'A2)  be  the  zeros  of  the  4th-degree  polynomial 

1  +  9iz~l  +  92z~ 2  +  91z~3  -f  z~4 

such  that  0  <  Aj  <  A2  <  7r  (or  —7 r  <  A2  <  Ai  <  0).  Then  (1  and  (2  can  be  explicitly 
expressed  in  closed  forms.  In  fact,  it  can  be  shown  that  when  A  :=  9\  -  4 92  +  8  is 
non-negative,  we  have 

Ci  —  2  (^si  "f"  \j s  1  —  and  C2  —  |  ^2  +  \J s  1  —  4  ^ 

where  Si  =  |(—0i  +  A1/2)  and  s2  =  \{-9i  —  A1/2);  and  when  A  <  0,  we  have 
Ci  =  \  (si  +  \/si  -  4)  and  C2  =  \  (sj.  -  yjs\  -  4^  . 

To  obtain  cos  A]  and  cos  A2  from  and  (2,  we  have 

cos  Ai  =  max(^{C1/|Ci|},3?{C2/|C2|}) 

cosA2  =  min(»{<1/|<1|},»{<2/|<2|}). 


109 


Figure  5.3:  (a)  Plot  of  squared  gain  with  77  =  0.92  and  a  =  a*  as  a  function  of  the 
normalized  frequency  in  the  case  of  two  sinusoids  with  uq  =  0.327T  and  u>2  =  0.457T. 
(b)  Location  of  poles,  with  dotted  lines  indicating  true  frequencies  on  the  unit  circle. 


110 


Substituting  these  results  in  (5.30)  gives  the  required  projection  0  of  0  onto  0O. 

5.5  Experimental  Results 

In  this  section,  we  would  like  to  provide  some  simulation  results  to  demonstrate  the 
performance  of  the  PF  method  under  various  circumstances.  For  simplicity,  our  sim¬ 
ulations  are  based  on  the  case  of  a  single  sinusoid  and  the  case  of  two  sinusoids,  as  we 
have  discussed  in  the  preceding  section.  From  these  results  we  shall  show  the  effect  of 
the  bandwidth  parameter  r]  of  the  AR  filter  on  the  sensitivity  of  convergence  to  initial 
guesses,  and  on  the  estimation  accuracy  of  the  PF  estimator.  We  shall  also  gain  some 
insight  into  the  ability  of  the  PF  method  to  resolve  close-spaced  frequencies  which  are 
unresolvable  by  periodogram  analysis. 

5.5.1  Univariate  PF  method 

For  the  univariate  PF  method  corresponding  to  q  =  1  in  Section  5.4.1,  we  find  it 
convenient  to  consider  an  alternative  representation  of  the  mapping  a(a)  in  terms  of 
the  normalized  frequency  /  =  oj/n.  Note  that  for  any  a  e  Ao(rf)  we  can  always  write 
a  =  — 2coso;  =  — 2cos(7 r/)  with  some  /  £  (0, 1).  Let  us  define  the  mapping 

<f(f)  :=  arccos{— |d(-2  cos(7r/))}/7r. 

It  is  clear  that  if  a  =  — 2  cos(7 r/)  is  a  fixed-point  of  a(a)  then  /  is  a  fixed-point  of  <£(/), 
and  vice  versa.  With  the  help  of  this  mapping,  the  algorithm  in  (4.8)  is  transformed 
into  a  fixed-point  iteration  in  terms  of  /,  namely, 

/m  =  vK/m— i)  (m  =  1,2,...).  (5.33) 

It  is  therefore  sufficient  to  study  the  behavior  of  <£(/)  as  a  function  of  /  in  order  to 
understand  the  convergence  of  the  iteration.  Notice  that  the  derivative  of  <£(/)  at  the 
fixed-point  /  =  arccos(— d/2)/7r  can  be  easily  shown  to  be  a'(d).  With  d  being  the 


111 


Figure  5.4:  Least  squares  mapping  <f(f)  in  the  case  of  a  single  sinusoid  with  fl  = 
u>i/n  =  0.31,  4>x  =  0,  and  SNR  =  — 3dB.  In  (a)  and  (b),  the  data  length  is  n  =  100  for 
each  of  the  10  realizations  plotted,  with  ri  =  1  for  (a)  and  rj  =  0.96  for  (b).  The  data 
length  is  increased  to  n  =  500  in  (c)  and  (d),  with  7/  being  1  and  0.96,  respectively. 


PF  estimator  as  given  in  Theorem  4.2  so  that  |d'(d)|  <  1,  the  mapping  <£(/)  is  also 
contractive  in  a  neighborhood  of  the  fixed-point  /.  Since  the  least  squares  technique  is 
used  in  the  calculation  of  a(o)  as  specified  by  (4.12),  we  thus  refer  to  <£>(/)  as  the  least 
squares  mapping  of  /.  In  Figure  5.4  are  plotted  10  independent  realizations  of  the 
least  squares  mapping  <p(f)  in  the  case  of  a  single  zero-phase  sinusoid  with  /i  =  0.31 
in  Gaussian  white  noise.  The  SNR  is  -3  dB,  and  the  data  length  is  n  =  100  for 
Figure  5.4(a)  and  Figure  5.4(b),  and  n  =  500  for  Figure  5.4(c)  and  Figure  5.4(d). 

It  is  clear  that  <p(f)  has  always  an  attractive  fixed-point  /  near  the  true  frequency, 
and  that  the  variation  of  the  fixed-point  is  directly  related  to  1  -  rj.  This  suggests 
that  the  highest  accuracy  is  achieved  with  i)  =  1.  On  the  other  hand,  by  comparing 
Figure  5.4(a)  with  Figure  5.4(b),  and  Figure  5.4(c)  with  Figure  5.4(d),  it  can  be 
seen  that  the  basin  of  attraction  —  the  collection  of  /  with  which  as  initial  guesses  the 
iteration  (5.33)  converges  to  /  —  is  significantly  larger  when  rj  =  0.96  than  when  tj  =  1, 
indicating  that  the  PF  method  with  a  relative  small  r/  is  able  to  accommodate  poor 
initial  guesses  for  the  iteration  (5.33)  to  converge  to  the  desired  fixed-point.  Moreover, 
comparing  the  graphs  in  Figure  5.4  column-wise  reveals  that  the  basin  of  attraction 
with  rj  —  0.96  is  basically  not  affected  as  the  data  length  n  increases,  whereas  the  basin 
of  attraction  with  rj  =  1.0  is  inversely  related  to  n.  Therefore  the  initial  accuracy  of 
0(1)  is  sufficient  when  rj  <  1  as  compared  to  o(rc_1)  when  rj  =  1.  In  summary,  this 
experiment  confirms  by  studying  the  behavior  of  the  mapping  <£(/)  the  necessity  of 
starting  with  a  relatively  small  value  of  r)  to  accommodate  poor  initial  guesses  and 
afterwards  gradually  increasing  rj  toward  1  to  improve  the  estimation  accuracy. 

To  further  illustrate  this  point,  Figure  5.5  presents  the  negative  logarithms  of 
mean-squared  error  (mse)  for  the  normalized  frequency  estimates  /  =  arccos(— a/2)/x 
with  various  values  of  77,  based  on  100  independent  realizations  of  a  single  sinusoid  in 
Gaussian  white  noise  with  different  lengths.  Here,  the  true  frequency  of  the  sinusoid 
is  =  0.427 r  —  0.42)  and  the  phase  is  fa  —  0.17T.  The  SNR  is  fixed  at  0  dB  for 


113 


24 


Figure  5.5:  Plot  of  — log(mse)  against  the  data  length  n  (xlOO)  with  (bottom  up)  rj 
=  0.85,  0.90,  0.95,  0.98,  0.99,  and  0.999.  The  dashed  curves  indicate  the  asymptotic 
variances  of  /  for  (bottom  up)  r/  =  0.85,  0.90,  0.95,  0.98,  and  0.99;  and  the  dark  solid 
curve  stands  for  the  asymptotic  variance  of  the  NLS  estimator. 

each  realization  by  adjusting  the  sample  variance  of  the  noise  according  to  the  sample 
variance  of  the  signal.  Recall  (see  Section  5.3)  that  a  sequence  of  PF  estimators 
{«(%)},  and  hence  {/(%)}>  can  be  obtained  in  correspondence  with  an  increasing 
sequence  {rjk}  in  such  a  way  that  a(r)k )  is  produced  by  the  iteration  (4.8)  using  a(r]k^i) 
as  the  initial  value.  In  Figure  5.5,  the  ^-sequence  contains  r\  —  0.85,  0.90,  0.95,  0.98, 
0.99,  and  0.999  for  each  fixed  data  length  n.  The  initial  guess  that  generates  the  PF 
estimator  with  rj  =  0.85  is  fixed  at  a0  =  —2  cos(0.67r)  as  the  data  length  grows  from 
n  =  100  to  n  —  1900,  so  that  the  initial  accuracy  is  merely  0(1).  For  each  fixed  n  and 
77,  the  iteration  (4.8)  is  terminated  at  the  mth  step  if  \fm  —  /m_i|  <  10~5. 

Two  conclusions  can  be  drawn  immediately  from  this  figure.  First  of  all,  it  is  clear 
as  before  that  starting  with  initial  guesses  of  accuracy  0(1)  the  PF  estimator  is  able 
to  improve  the  estimation  accuracy  and  eventually  achieve  the  accuracy  0(n-3/2)  of 
NLS  (or  periodogram  analysis).  The  key  point  is  that  the  improvement  of  accuracy  is 
obtained  not  by  switching  from  one  completely  different  method  to  another  (e.g.,  PF 


114 


0.6 


Figure  5.6:  Univariate  least  squares  mapping  <£(/)  in  the  case  of  two  sinusoids.  The 
data  length  is  n  =  100  for  each  of  the  10  realizations  plotted,  and  the  SNR  is  0  dB 
per  sinusoid,  (a)  77  —  0.99  for  well-separated  frequencies  with  fi  =  0.47  and  /2  =  0.51. 
(b)  T]  =  1  for  closely-spaced  frequencies  with  fi  =  0.485  and  /2  =  0.495. 

with  rj  =  1  initiated  by  DFT,  as  suggested  by  Quinn  and  Fernandes  (1991)),  but  by 
an  integrated  method  of  linear  least  squares  estimation  plus  linear  recursive  filtering, 
with  the  bandwidth  parameter  rj  increasing  toward  1.  This  simple  integration  makes 
it  easier  for  hardware  implementation  in  some  applications.  Secondly,  the  mse  closely 
follows  the  theoretical  asymptotic  variance  of  the  PF  estimator  given  by  (5.26)  for 
relatively  small  data  lengths  if  77  is  not  too  close  to  1.  When  77  is  close  to  1,  however, 
a  large  sample  is  required  for  the  theoretical  results  to  be  meaningful.  This  suggests 
that  a  more  careful  analysis  be  considered  when  1  -  77  is  comparable  with  n~l. 

As  aforementioned  (see  Section  5.4.1),  in  the  case  of  multiple  sinusoids  where  the 
frequencies  are  well  separated,  the  univariate  PF  method  can  be  applied  parallelly  to 
one  frequency  at  a  time  as  if  there  were  only  a  single  sinusoid.  This  is  because  the 
AR  filter  is  bandpass  so  that  the  mapping  (p{f  )  is  not  significantly  affected  by  the 
frequency  components  far  away  from  the  center  of  the  filter’s  effective  pass-band.  As  a 
result,  local  attractive  fixed-points  appear  in  <p(f )  near  the  frequencies  of  the  sinusoidal 


115 


signal,  as  illustrated  by  Figure  5.6(a).  In  this  figure,  10  independent  realizations  of 
<p(f)  are  plotted,  each  from  a  data  record  of  length  n  =  100  containing  two  zero-phase 
sinusoids  whose  (normalized)  frequencies,  =  0.47  and  /2  =  0.51,  are  separated  by 
two  Fourier  bins  of  width  A /  :=  2/n  =  0.02.  The  noise  is  white  Gaussian,  and  the 
SNR  is  0  dB  per  sinusoid.  Clearly,  the  univariate  mapping  <p(f)  has  two  distinct  local 
attractive  fixed-points  —  each  corresponding  to  a  sinusoid.  This,  however,  is  no  longer 
the  case  —  even  with  rj  =  1  —  when  the  frequencies  are  closer  than  a  Fourier  bin, 
as  shown  in  Figure  5.6(b)  where  the  true  frequencies  are  fi  =  0.485  and  /2  =  0.495 
while  other  conditions  remain  the  same.  This  phenomenon  is  due  to  the  resolution 
limit  of  the  AR  filter  as  we  have  discussed  earlier  in  Section  5.4.1.  To  resolve  closely- 
spaced  frequencies,  we  must  rely  on  the  multivariate  PF  method  that  deals  with  the 
frequencies  simultaneously. 

5.5.2  Multivariate  PF  method  for  Two  sinusoids 

Let  us  now  consider  the  multivariate  PF  method  using  the  AR  filter  for  two  sinusoids 
( q  =  2)  in  Gaussian  white  noise.  All  of  the  following  simulations  are  based  on  100 
independent  realizations  of  {yt}  with  a  relatively  short  length  of  n  —  100.  The  phases 
of  the  sinusoids  are  fixed  at  zero,  and  the  sample  variance  of  the  noise  is  adjusted  in 
each  realization  according  to  the  sample  variance  of  the  signal  in  order  to  achieve  the 
required  signal-to-noise  ratio. 

Furthermore,  in  both  PF  and  GLS  —  corresponding  to  the  parametrizations  (5.6) 
and  (5.2),  respectively  —  the  poles  of  the  AR  filter  are  constrained  to  be  on  the  circle 
\z\  =  r),  by  projection  if  necessary  (see  Section  5.1.2),  so  that  the  parameter  77  effec¬ 
tively  controls  the  bandwidth  of  the  AR  filter  and  the  performance  of  the  estimators. 
For  convenience,  the  following  simulation  results  are  given  in  regard  to  the  normalized 
frequencies  fk  =  wk/-K  again,  and  the  average  mean-squared  error 

mse:=i{£(/i-/i)2  +  4-/2)2} 


116 


is  employed  as  an  overall  performance  index.  Moreover,  we  define  the  average  bias 
and  average  variance  of  the  frequency  estimates  by 

bias:=i{(£'(/1)-/1)2  +  (£(/2)-/2)2}  and  var  :=  f{var(/i)  +  var(/2)} 

respectively.  The  frequency  estimates  of  both  PF  and  GLS  are  obtained  by  the  fixed- 
point  iteration  (4.8)  in  connection  with  (2.16)  that  provides  the  relationship  between 
the  frequency  and  AR  estimates.  The  stopping  rule  of  the  iteration  is  given  by 

y/(f: I(m)  -  /i(m_1))2  +  (/2(m)  -  /2(m_1))2  <  10-5. 

In  other  words,  the  iteration  is  terminated  at  the  mth  iteration  if  this  inequality  is 
satisfied. 

In  our  simulations,  we  first  compare  the  performance  of  PF  and  GLS  in  two  cases 
where  the  frequencies  are  separated  by  four  and  two  Fourier  bins,  respectively.  In  both 
cases  the  SNR  is  fixed  at  0  dB  per  sinusoid,  while  the  bandwidth  parameter  rj  in  the 
AR  filter  (5.1)  takes  on  different  values.  Since  tj  varies,  it  is  convenient  to  explicitly 
write  the  corresponding  frequency  estimates  (fi(r)),  /2(7?))  as  functions  of  t].  Table  5.1 
and  Table  5.2  present  some  statistics  of  the  frequency  estimates  for  eight  ascending 
values  of  r /,  that  is,  ifo  =  0.95,  %  =  0.96, ...,r/8  =  1.  The  mean  and  variance  of  the 
stopping  time  m  are  also  given  as  “complexity”  in  the  form  of  “mean  ±  variance”. 

In  both  PF  and  GLS,  we  use  Prony’s  estimator  aLs  in  (2.9)  as  the  initial  guess 
of  the  AR  parameter  a,  corresponding  the  first  value  rji  =  0.95.  When  the  iteration 
terminates,  the  resulting  AR  estimate,  denoted  by  is  used  not  only  to  obtain 

the  frequency  estimates  (/i(t?i),  /^(f/i)))  but  also  to  initiate  the  iteration  for  the  next 
value  rj2  =  0.96.  In  general,  as  rj  grows,  we  employ  the  previous  AR  estimate  a(i]k_1) 
to  initiate  the  iteration  (4.8)  and  yield  6(77*.). 

In  Table  5.1,  the  true  frequencies  are  separated  by  four  Fourier  bins  of  width 
A /  =  0.02  with  (/i,/2)  =  (0.41,0.59).  As  we  can  see,  Prony’s  estimator  gives  poor 
frequency  estimates,  while  both  PF  and  GLS  significantly  improve  Prony’s  estimator  in 


117 


Table  5.1:  PF  fe  GLS  Estimates  for  Well- Separated  Frequencies 


PF 

GLS 

n 

mse 

bias 

var 

complexity 

mse 

bias 

var 

complexity 

Prony 

4.72e-3 

4.40e-3 

3.23e-4 

— 

4.72e-3 

4.40e-3 

3.23e-4 

— 

0.950 

2.20e-6 

1.17e-8 

2.19e-6 

8.0  ±0.6 

2.34e-6 

1.48e-7 

2.19e-6 

8.1  ±0.8 

0.960 

1.96e-6 

9.14e-9 

1.96e-6 

3.1  ±0.2 

2.04e-6 

9.47e-8 

1.95e-6 

3.2  ±0.4 

0.970 

1.79e-6 

9.72e-9 

1.78e-6 

3.2  ±0.3 

1.84e-6 

6.21e-8 

1.77e-6 

3.3  ±0.4 

0.980 

1.67e-6 

1.64e-8 

1.65e-6 

3.3  ±0.4 

1.69e-6 

4.96e-8 

1.65e-6 

3.4  ±0.4 

0.985 

1.63e-6 

2.39e-8 

1.60e-6 

2.9  ±0.4 

1.65e-6 

4.79e-8 

1.60e-6 

2.8  ±0.4 

0.990 

1.59e-6 

3.60e-8 

1.55e-6 

3.0  ±0.5 

1.60e-6 

5.08e-8 

1.55e-6 

2.7  ±0.4 

0.995 

1.55e-6 

5.53e-8 

1.49e-6 

3.4  ±0.4 

1.55e-6 

6.08e-8 

1.49e-6 

2.9  ±0.6 

1.000 

1.50e-6 

8.27e-8 

1.42e-6 

3.7  ±0.6 

1.51e-6 

8.29e-8 

1.42e-6 

3.4  ±0.5 

Table  5.2:  PF  fc  GLS  Estimates  for  Closely-Spaced  Frequencies 


PF 

GLS 

V 

mse 

bias 

var 

complexity 

mse 

bias 

var 

complexity 

Prony 

1.46e-2 

1.40e-2 

5.68e-4 

— 

1.46e-2 

1.40e-2 

5.68e-4 

— 

0.950 

1.81e-6 

2.69e-8 

1.78e-6 

10.6  ±3.4 

4.68e-6 

3.05e-6 

1.63e-6 

11.0  ±  5.3 

0.960 

1.67e-6 

6.20e-8 

1.61e-6 

3.3  ±0.4 

3.63e-6 

2.17e-6 

1.46e-6 

3.8  ±0.2 

0.970 

1.63e-6 

1.63e-7 

1.46e-6 

3.8  ±0.3 

2.95e-6 

1.59e-6 

1.36e-6 

3.9  ±0.2 

0.980 

1.75e-6 

4.21e-7 

1.32e-6 

4.3  ±0.3 

2.56e-6 

1.29e-6 

1.27e-6 

3.7  ±0.4 

0.985 

1.90e-6 

6.57e-7 

1.25e-6 

4.2  ±0.4 

2.46e-6 

1.25e-6 

1.21e-6 

2.8  ±0.4 

0.990 

2.15e-6 

9.83e-7 

1.16e-6 

4.4  ±0.3 

2.45e-6 

1.31e-6 

1.15e-6 

3.0  ±0.5 

0.995 

2.47e-6 

1.41e-6 

1.06e-6 

4.5  ±0.5 

2.56e-6 

1.50e-6 

1.06e-6 

3.7  ±0.7 

1.000 

2.85e-6 

1.91e-6 

9.45e-7 

4.5  ±0.8 

2.86e-6 

1.91e-6 

9.46e-7 

4.3  ±0.8 

118 


Figure  5.7:  Plot  of  mse(xlO  6)  against  rj  for  closely-spaced  frequencies. 

terms  of  mean-squared  error,  even  with  a  relatively  small  rj.  Moreover,  the  estimation 
accuracy  can  be  further  improved  by  increasing  rj  toward  1,  just  like  in  the  single 
sinusoid  case.  Table  5.1  shows  that  as  r)  approaches  1  the  PF  and  GLS  estimates 
achieve  a  precision  (mse)  of  1.50  x  10-6  —  very  close  to  the  asymptotic  variance  of  the 
NLS  estimator  which  in  this  case  equals  1.22  X  10~6.  Therefore,  when  the  frequencies 
are  well-separated,  PF  and  GLS  have  the  same  final  performance  -  which  approaches 
that  of  the  NLS  method  —  as  rj  increases  toward  1. 

When  the  frequencies  are  close  to  each  other,  the  PF  estimator  performs  better 
than  GLS,  as  can  be  seen  from  Table  5.2  and  Figure  5.7.  In  this  experiment,  the 
true  frequencies  are  (/i,/2)  =  (0.47,0.51)  while  all  other  conditions  remain  the  same 
as  in  the  previous  one.  Notice  that  the  true  frequencies  are  now  separated  only  by 
two  Fourier  bins  as  compared  to  four  in  the  previous  experiment.  It  is  interesting  to 
observe  that  as  the  bandwidth  parameter  -q  increases  toward  1  the  mse  of  both  methods 
no  longer  decreases  monotonically  as  in  the  case  of  a  single  sinusoid  and  in  the  case 
where  the  two  frequencies  are  separated  further  by  four  Fourier  bins.  Instead,  it  starts 
increasing  after  a  certain  value  of  rj  (see  also  Figure  5.7).  The  reason  is  the  following. 


119 


Table  5.3:  Estimation  With  77  =  1 


(A,  A) 

mse 

£(/i)±var(/i) 

E(f2)±v&i(f2) 

complexity 

(0.41,0.59) 

4.33e-6 

0.409704  ±1.30e-6 

0.590529  ±6.99e-6 

17.9±  10.7 

(0.47,0.51) 

1.38e-4 

0.468628  ±9.37^7 

0.513672  ±2.49e-4 

24.3  ±63.3 

A  closer  examination  of  Table  5.1  and  Table  5.2  reveals  that  as  rj  approaches  1  the 
bias  increases  while  the  variance  decreases  in  both  methods.  In  the  first  case  where 
the  frequencies  are  well  separated  (Table  5.1,  the  bias  never  dominates  the  variance, 
and  hence  the  mse  decreases  basically  along  with  the  decrease  of  the  variance.  On 
the  other  hand,  the  bias  becomes  dominant  as  r/  approaches  1  in  the  second  case  (see 
Table  5.2),  and  a  trade-off  effect  between  bias  and  variance  takes  place.  As  we  can 
see  from  Figure  5.7,  the  best  value  of  r/  for  the  PF  estimator  lies  between  0.96  and 
0.98  where  the  mse  achieves  the  smallest  values.  The  GLS  estimator  is  clearly  inferior 
to  the  PF  estimator  in  this  example  because  of  its  relatively  higher  bias.  Indeed, 
the  bias  and  variance  of  the  GLS  estimator  play  an  equal  role  in  the  mse,  since  their 
magnitudes  are  of  the  same  order  (see  Table  5.2). 

Table  5.2  illustrates  the  role  of  r}  as  a  parameter  that  can  be  utilized  to  balance  the 
bias  and  variance  of  the  PF  estimator  for  minimizing  the  mean-squared  error.  Now, 
in  Table  5.3,  we  illustrate  the  role  of  77  in  the  convergence  of  the  fixed-point  iteration 
(4.8)  when  initial  guesses  are  poor.  Instead  of  gradually  increasing  77  toward  1,  as  done 
in  Table  5.1  and  Table  5.2,  the  frequency  estimates  in  Table  5.3  were  obtained  right 
away  with  rj  —  1,  using  Prony’s  estimator  as  the  initial  guess.  This  is  equivalent  to 
the  iterative  procedure  that  employs  the  AR  filter  without  rj  and  starts  with  Prony’s 
estimator.  As  can  be  seen  from  Table  5.3,  the  mean-squared  error  is  higher  than  the 
mse  reported  in  Table  5.1  and  Table  5.2  corresponding  to  (gradually  achieved)  77  =  1, 
especially  for  the  second  case  where  the  frequencies  are  relatively  close  to  each  other. 


120 


This  indicates  that  without  77  in  the  AR  filter  the  iteration  (4.8)  may  fail  to  converge 
to  the  desired  fixed-point  when  poor  initial  guesses,  such  as  Prony’s  estimator,  are 
used.  The  reason  is  that  the  bandwidth  of  the  AR  filter  without  77  (or,  equivalently, 
with  77  =  1)  is  extremely  narrow.  Although  it  could  be  helpful  to  have  a  narrow 
bandwidth  for  the  enhancement  of  the  sinusoids  if  good  initial  guesses  are  used,  a 
narrow  bandwidth  might  not  be  able  to  capture  the  sinusoidal  signal  when  tuned 
according  to  inaccurate  frequency  estimates.  This  experiment  verifies  once  again  that 
the  safest  way  of  applying  the  PF  method  is  to  start  with  a  relatively  small  77,  to 
accommodate  even  poor  initial  guesses,  and  then  gradually  increase  77  as  improved 
estimates  from  previous  iterations  become  available. 

To  show  the  performance  of  the  PF  estimator  under  different  signal-to-noise  ratios 
when  the  frequencies  are  closely  spaced  within  a  Fourier  bin,  Figure  5.8(a)  presents  the 
negative  logarithm  of  the  mse  for  various  values  of  SNR,  with  the  dotted  line  indicating 
the  asymptotic  variance  of  NLS  as  a  reference.  In  this  example,  the  frequencies  are 
(/i,/2)  =  (0.485,0.495)  and  the  bandwidth  parameter  77  is  fixed  at  0.985  in  both 
PF  and  GLS.  Prony’s  estimator  again  is  used  to  initiate  the  fixed-point  iteration 
(4.8)  for  both  methods.  As  can  be  seen,  the  mse  of  the  PF  estimator  closely  follows 
the  asymptotic  variance  of  NLS  when  SNR  >  2.5  dB,  and  the  performance  of  both 
PF  and  GLS  deteriorates  rapidly  when  the  SNR  is  below  this  threshold.  The  poor 
initial  accuracy  of  Prony’s  estimator  is  largely  responsible  for  this  particular  value  of 
threshold.  In  fact,  simulations  show  that  the  threshold  can  be  extended  to  —2  dB  if 
the  initial  guesses  are  taken  to  be  the  two  Fourier  frequencies  which  correspond  to  the 
largest  absolute  values  in  the  FFT  of  the  data.  Averages  of  frequency  estimates  are 
plotted  against  various  SNR  in  Figure  5.8(b).  It  is  clear  that  both  PF  and  GLS  are 
able  to  resolve  the  frequencies  which  cannot  be  resolved  by  periodogram  analysis,  but 
the  PF  method  has  a  smaller  bias  which  allows  it  to  provide  more  accurate  frequency 
estimates  than  the  GLS  method. 


121 


Figure  5.8:  Closely-spaced  frequencies  with  =  (0.485,0.495).  (a)  Plot  of 

— log(mse)  against  SNR  in  dB,  with  the  dotted  curve  indicating  the  asymptotic  vari¬ 
ance  of  NLS.  (b)  Plot  of  averaged  frequency  estimates  against  SNR  in  dB,  with  dotted 
lines  indicating  true  frequencies. 


122 


We  noticed  from  our  intensive  simulations  that  if  the  distance  between  the  two 
frequencies  is  further  reduced,  it  is  very  likely  that  the  fixed-point  a  will  fall  outside 
the  parameter  space  Ao(i)),  resulting  in  a  single  frequency  estimate  /x  =  f2  between 
the  true  frequencies.  If  some  rough  knowledge  about  the  separation  is  known,  more 
accurate  estimates  can  be  obtained  upon  projecting  a  back  into  Ao(r/).  Let  0  be  the 
projection  of  9(a)  into  @0.  Then,  a  :=  T ~l9  defines  the  projection  of  a  into  -40(r?). 
By  this  projection,  the  separation  of  /x  and  /2  can  be  effectively  controlled  by  rj,  as 
can  be  seen  in  Figure  5.2,  and  the  improvement  of  estimation  accuracy  be  achieved 
upon  judiciously  selecting  r/.  Figure  5.9  shows  the  improvement  of  PF  over  GLS  on 
the  estimation  accuracy  when  the  frequencies  are  extremely  close.  In  this  experiment, 
the  true  frequencies,  (fu  ft)  =  (0.41,0.412),  are  only  10%  apart  relative  to  the  width 
of  a  Fourier  bin.  The  frequency  estimates  were  obtained  with  r)  —  0.997  and  the 
fixed-point  iteration  (4.8)  was  initiated  by  Prony’s  estimator.  Figure  5.9(a)  shows  the 
negative  logarithm  of  the  mse  for  different  values  of  SNR  and  Figure  5.9(b)  presents 
the  averages  of  the  frequency  estimates.  Compared  to  the  GLS  estimator,  the  PF 
estimator  has  a  much  smaller  bias  which  enables  it  to  achieve  a  smaller  mean-squared 
error.  Notice  that  the  GLS  estimator  gives  essentially  a  single  frequency  /  ss  0.411 
between  the  two  true  frequencies.  This  procedure,  however,  should  not  be  considered 
as  a  method  that  detects  the  number  of  sinusoids,  since  the  projection  of  a  into  Ao(rj) 
implicitly  requires  the  information  about  the  number  of  sinusoids. 

Finally,  we  note  that  in  the  preceding  discussion  the  phases  were  fixed  at  zero. 
Experience  shows,  however,  that  when  the  phases  are  chosen  at  random  the  mse  may 
worsen  somewhat.  This  is  understandable  due  to  the  small  sample  size  which  cannot 
explain  the  addition  of  extra  sources  of  variability  (Kay  and  Marple,  1981). 

To  end  this  section,  let  us  investigate  the  behavior  of  the  mapping  a(a)  in  the 
case  of  two  sinusoids.  For  convenience,  we  transform  the  mapping  into  the  frequency 
domain,  as  we  have  done  in  the  single  sinusoid  case,  and  obtain  a  two-dimensional  least 


123 


Figure  5.9:  Closely-spaced  frequencies  with  (/i,/2)  =  (0.41,0.412).  (a)  Plot  of 

-log(mse)  against  SNR  in  dB.  (b)  Plot  of  averaged  frequency  estimates  against  SNR 
in  dB  with  dotted  lines  indicating  true  frequencies. 


Figure  5.10:  Two-dimensional  least  squares  mapping  t/>(/)  =  <p(f)  -  f  in  the  case  of 
two  zero-phase  sinusoids  with  (u>i,a>2)  =  (0.357r,0.557r).  The  SNR  is  0  dB  per  sinusoid, 
and  the  data  length  is  n—  100.  A  single  realization  of  ifi(f)  with  77  =  0.96  is  plotted 
over  the  region  (/i,/2)  €  [0.25,0.45]  X  [0.45,0.65]. 

squares  mapping  <p(f)  =  f2)]T ,  where  /  :=  [/i,/2]t.  More  precisely, 

we  define  <£(/)  as  the  composition  of  the  following  mappings: 

0  :  /  a(a)  ^  /  =  <£(/) 

where  a  is  determined  from  the  identities  in  (5.30)  with  nf  in  place  of  (Ai,A2),  and 
/  is  obtained  from  the  zeros  of  the  AR  polynomial  corresponding  to  a(a).  With  this 
mapping,  the  fixed-point  iteration  (4.8)  becomes  fm  =  <p(fm_  1),  (m  =  1,2, . . .). 
Figure  5.10  shows  a  single  realization  of  the  mapping  V>( / )  =  [V’i(Z),  tM/)]T  := 
together  with  the  zero-plane,  where  the  true  frequencies  are  well  separated  by 
ten  Fourier  bins.  The  intersection  of  t^i(/)  with  the  zero-plane  defines  the  fixed-curve 
<£i(/i, /2)  =  fi  on  which  the  /i-coordinate  cannot  be  altered  by  the  mapping  <£(/). 
Similarly,  the  intersection  of  ^2(/)  with  the  zero-plane  determines  the  fixed-curve 
ip2(  fi ,  /2)  =  f-i  on  which  the  /2-coordinate  cannot  be  changed  by  f  The  fixed- 


125 


Figure  5.11:  Two-dimensional  least  squares  mapping  •$(/)  —  <p(f)  —  f  in  the  case 
of  two  zero-phase  sinusoids  with  (uq,  w2)  =  (0.4857T,  0.495x).  The  SNR  is  5  dB  per 
sinusoid,  and  the  data  length  is  n  —  100.  A  single  realization  of  ^(/)  with  rj  =  0.985 
is  plotted  over  the  region  (/i,/2)  €  [0.41,0.56]  x  [0.42,0.57]. 

point  of  <p(f)  is  therefore  given  by  the  intersection  of  these  fixed-curves.  Moreover, 
the  fixed-point  iteration  can  be  written  in  terms  of  f  )  as  follows 

fm  =  fm-l+Wm-l)  0  =  1,2,...). 

It  is  clear  that  V>i(/m_i)  and  foifm-i)  are  the  increments  in  /i  and  /2,  respectively, 
at  the  mth  iteration. 

Figure  5.12(a)  and  Figure  5.12(b)  present  the  contours  of  ?/>(/)  viewed  from  above 
and  below  the  zero-plane.  It  is  interesting  to  observe  that  the  two  fixed-curves  roughly 
coincide  with  the  straight  lines  fi  =  0.35  and  /2  =  0.55,  respectively.  This  implies 
that  the  two-dimensional  search  for  the  fixed-point  of  <£(/)  is  decoupled  as  two  in¬ 
dependent  one- dimensional  problems  of  seeking  to  find  fixed-points  along  one  of  the 
coordinates  while  keeping  the  other  fixed.  This  phenomenon  is  similar  to  what  we  have 
encountered  in  Chapter  1  where  the  multi- dimensional  optimization  problem  of  non- 


126 


linear  least  squares  can  be  approximated  by  a  number  of  independent  one- dimensional 
optimization  problems  (periodogram  analysis)  when  the  frequencies  are  well  separated. 

For  closely-spaced  frequencies,  however,  the  behavior  of  tp(f)  is  slightly  different, 
as  one  may  have  expected:  the  two-dimensional  problem  is  no  longer  decoupled  when 
the  frequencies  occur  within  a  Fourier  bin.  To  illustrate  this  point,  a  single  realization 
of  )  is  shown  in  Figure  5.11  where  the  true  frequencies  are  separated  only  by  50% 
of  a  Fourier  bin.  Figure  5.12(c)  and  Figure  5.12(d)  present  the  contour  plots  together 
with  the  diagonal  line  fx  =  /2.  It  is  clear  that  the  fixed-curves  do  not  independently 
provide  correct  frequency  estimates  any  more,  since  in  the  region  fi  <  /2  they  are 
skewed  toward  the  lines  fx  =  0.49  and  /2  =  0.49,  respectively,  tending  to  yield  a 
single  frequency  estimate  fx  =  /2  =  0.49,  i.e.,  the  average  of  the  true  frequencies. 
However,  the  fixed-curves,  jointly ,  provide  again  the  correct  frequency  estimates  by 
their  intersection  —  namely,  the  (multivariate)  fixed-point  of  the  mapping  <£(/). 

5.6  Concluding  Remarks 

Given  a  time  series  {yx, . . . ,  yn}  from  a  stochastic  process  {?/<}  in  (1.1),  we  considered 
the  classical  problem  of  frequency  estimation  in  the  presence  of  additive  noise.  We  pro¬ 
posed  the  PF  method  that  overcomes  the  predicament  of  inconsistency  of  Prony’s  esti¬ 
mator  and  provides  consistent  frequency  estimates.  As  a  general  method  of  parametric 
filtering,  it  unifies  and  extends  several  existing  procedures  of  frequency  estimation  in 
the  literature. 

Coupled  with  the  AR  filter  that  has  an  extra  bandwidth  parameter,  the  PF  method 
is  able  to  accommodate  poor  initial  guesses  of  accuracy  0(1)  and  improve  them  with 
a  simple  iterative  algorithm  of  linear  least  squares  estimation  plus  linear  recursive 
filtering  to  achieve  the  same  accuracy  of  0(n~3l 2)  as  the  computationally  cumbersome 
procedure  of  nonlinear  least  squares.  In  the  statistical  analysis  of  the  PF  method,  we 


127 


0.42  0.495  0.57  0.42  0.495  0.57 


Figure  5.12:  Contours  of  the  two-dimensional  least  squares  mapping  tH/).  First  row: 
(a)  Contour  plot  of  in  Figure  5.10  above  the  zero-plane;  (b)  Contour  plot  of 
t£(/)  in  Figure  5.10  below  the  zero-plane.  Second  row:  (c)  Contour  plot  of  ■$(/)  in 
Figure  5.11  above  the  zero-plane  with  the  diagonal  line;  and  (d)  Contour  plot  of  % }(f) 
in  Figure  5.11  below  the  zero-plane  with  the  diagonal  line. 


extended  the  classical  results  on  the  ergodicity  and  asymptotic  normality  of  sample 
autocovariances,  and  proved  the  uniform  strong  consistency  of  sample  autocovariances 
after  parametric  filtering  and  the  asymptotic  normality  of  sample  autocovariances  from 
the  mixed-spectrum  process  {yt}  of  sinusoids  in  additive  colored  noise. 

There  are  many  ways  of  possible  continuation  and  extension  of  the  current  work. 
As  we  have  seen,  the  AR  filter  works  very  well  in  many  cases  as  a  parametric  filter  in 
the  PF  method.  Although  some  other  filters  are  available  for  single  sinusoid  estimation 
(e.g.,  Kedem,  1990;  Lopes,  1991;  Kedem  and  Yakowitz,  1992),  it  is  still  interesting  to 
find  other  parametric  filters  that  provide  better  results  especially  in  the  case  of  multiple 
sinusoids  where  the  frequencies  are  relatively  close  to  0  or  7r. 

The  ability  of  the  PF  method  to  resolve  closely-spaced  frequencies  is  inherited  from 
the  AR  modeling  of  the  sinusoidal  signal  that  allows  the  data  to  extrapolate  beyond 
the  observation  interval  without  assuming  them  to  be  zero.  It’s  ability  of  producing 
accurate  estimates  when  the  AR  filter  is  employed  is  due  to  the  noise- cleaning  capa¬ 
bility  of  AR  filtering.  These  characteristics  are  also  observed  in  many  other  methods 
of  frequency  estimation,  especially  those  that  employ  the  principal  component  anal¬ 
ysis  (e.g.,  Tufts  and  Kumaresam,  1982).  For  fthe  uture  research,  some  comparisons 
between  these  procedures  are  needed  in  order  to  understand  their  advantages  and 
disadvantages,  although  some  results  are  available  in  these  regards  (e.g.,  Kay,  1988). 

Furthermore,  since  the  PF  method  assumes  that  the  number  of  sinusoids  is  known, 
it  is  therefore  necessary  to  couple  it  with  a  procedure  that  estimates  this  number.  Some 
eigenvalue-based  procedures  are  available  in  the  literature  (e.g.,  Fuchs,  1988).  Experi¬ 
ence  shows  that  the  PF  method  is  likely  to  yield  multiple  zeros  in  the  AR  polynomial 
when  the  number  of  sinusoids  is  less  than  the  assumed  value  in  the  calculation  of  the 
PF  estimator  while  the  SNR  is  sufficiently  high  (see  also  Kay,  1984).  This  makes  it 
possible  to  estimate  the  number  of  sinusoids  q  by  a  certain  goodness-of-fit  test  based 
on  the  PF  frequency  estimates  corresponding  to  a  number  of  assumed  values.  For  in- 


129 


stance,  to  test  the  hypothesis  of  q  =  1  versus  q  =  2,  we  could  first  obtain  the  frequency 
estimates  ui  and  (aq,^)  using  the  PF  method  with  q  =  1  and  q  =  2,  respectively. 
If  u>\  and  l>2  are  not  significantly  different,  the  hypothesis  of  q  =  1  would  be  clearly 
preferred.  Otherwise,  we  would  calculate  the  error  J'n  in  (1.13),  corresponding  to  Cj 
with  q  =  1  and  (aq,d>2)  with  q  =  2,  respectively,  and  obtain  J'n(l)  and  J'n{ 2).  The 
hypothesis  of  q  =  2  would  be  in  favor  if  /'  (2)  is  significantly  smaller  that  J' ( 1).  An 
alternative  way  is  to  compare  the  estimated  amplitudes  of  the  sinusoids  with  estimated 
frequencies  and  reject  the  hypothesis  of  q  —  2  if  one  of  the  amplitudes  is  significantly 
small.  Primary  results  along  this  fine  seems  quite  promising  but  rigorous  statistical 
analysis  is  still  in  need. 

It  is  also  possible  to  combine  the  PF  method  with  principal  component  analysis 
and  obtain  a  hybrid  procedure  of  frequency  estimation.  It  may  have  been  noticed  that 
the  AR  estimation  step  in  the  PF  method  is  based  on  the  AR  model  of  exact  order 
2 q.  A  high-order  AR  model  is  a  clear  alternative.  In  fact,  we  could  first  estimate 
the  coefficients  of  a  higher  order  AR  model5  by  principal  component  analysis  (see 
Chapter  2),  and  obtain  the  AR  parameter  a(a)  for  the  exact  model  from  the  2 q  zeros 
of  the  (estimated)  high-order  AR  polynomial  which  are  in  complex  conjugate  pairs  and 
closest  to  the  unit  circle.  The  AR  parameter  could  in  turn  be  employed  in  the  filtering 
step.  This  hybrid  procedure  cleans  up  the  noise  in  both  eigenvalue  and  frequency 
domains,  and  hopefully  would  provide  better  frequency  estimates  in  low  SNR  cases. 

For  those  who  are  familiar  with  adaptive  filtering,  it  is  easy  to  see  that  the  PF 
method  with  the  AR  filter  can  be  readily  modified  to  obtain  an  adaptive  (recursive) 
algorithm  capable  of  tracking  time- varying  frequencies  in  noise.  In  fact,  since  the  AR 
filter  is  already  recursive  in  time,  all  we  need  is  to  employ  the  recursive  least  squares 
algorithm  (e.g.,  Haykin,  1986)  to  update  the  least  squares  estimator  a(a).  Details  in 
this  regard  can  be  found  in  Li  and  Kedem  (1989)  and  Dragosevic,  et  al.  (1982). 

5Just  a  slightly  higher  than  2g  in  order  not  to  increase  the  computational  complexity  too  much. 


130 


It  would  also  be  interesting  to  find  connections  and  possible  extensions  to  other 
problems  in  related  areas,  such  as  the  frequency  estimation  of  damped  sinusoids  and 
the  estimation  of  direction  of  arrival  (DO A). 

In  theoretical  aspects,  rigorous  proof  is  still  needed  for  multiple  sinusoids  when 
the  AR  filter  is  used  with  rj  =  1.  Especially,  a  careful  study  should  be  carried  out 
to  analyze  the  situation  when  the  frequencies  are  closely-spaced  with  respect  to  n~l . 
A  recent  work  has  been  published  by  Hannan  and  Quinn  (1989)  investigating  the 
nonlinear  least  squares  method  under  this  situation.  Some  more  efforts  should  be  paid 
in  this  direction  to  analyze  the  PF  as  well  as  other  methods.  For  the  PF  method 
itself,  a  detailed  analysis  is  needed  in  order  to  understand  its  behavior  when  1  —  77  is 
comparable  with  n~l.  This  rules  out  the  possibility  of  using  the  traditional  technique 
of  stationary  processes,  as  we  employed  in  our  statistical  analysis,  since  the  filter  is 
now  a  f  unction  of  the  data  length  n. 

Finally,  as  a  general  idea,  parametric  filtering  has  close  relations  with  the  filter- 
bank  technique,  multiresolution  analysis,  and  wavelet  transformation,  all  of  which  can 
be  regarded  as  ways  of  extracting  useful  information  from  the  filtered  data  obtained 
via  parametric  filtering.  It  is  therefore  not  impossible  to  generalize  the  basic  ideas 
behind  the  PF  method  to  other  estimation/detection  problems,  an  example  of  which 
is  the  use  of  parametrized  first-order  sample  autocorrelation  function  as  a  tool  for 
discrimination  and  identification  of  different  signals  (Kedem  and  Li,  1989;  1992). 


131 


Bibliography 


[1]  Abatzoglou,  T.  J.  (1985).  A  fast  maximum  likelihood  algorithm  for  frequency 
estimation  of  a  sinusoid  based  on  Newton’s  Method.  IEEE  Trans.  Acoust.,  Speech, 
Signal  Process.,  vol.  33,  no.  1,  pp.  77-89. 

[2]  An,  H.,  Chen,  Z.,  and  Hannan,  E.  J.  (1983).  The  maximum  of  the  periodogram. 
J.  Multivariate  Anal.,  vol.  13,  pp.  383-400. 

[3]  Bienvenu,  G.  and  Kopp,  L.  (1983).  Optimality  of  high  resolution  array  processing 
using  the  eigensystem  approach.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process., 
vol.  31,  no.  5,  pp.  1235-1247. 

[4]  Blackman,  R.  B.  and  Tukey,  J.  W.  (1959).  The  Measurement  of  Power  Spetrca 
From  the  Point  of  View  of  Communications  Engineering.  New  York:  Dover. 

[5]  Bresler,  Y.  and  Macovski,  A.  (1986).  Exact  maximum  likelihood  parameter  esti¬ 
mation  of  superimposed  exponential  signals  in  noise.  IEEE  Trans.  Acoust.,  Speech, 
Signal  Process.,  vol.  34,  no.  5,  pp.  1081-1089. 

[6]  Brockwell,  P.  J.  and  Davis,  R.  A.  (1987).  Time  Series:  Theory  and  Methods.  New 
York:  Springer- Verlag. 

[7]  Cadzow,  J.  A.  (1982).  Spectral  estimation:  An  overdetermined  rational  model 
equation  approach.  Proc.  IEEE,  vol.  70,  no.  9,  pp.  907-939. 


132 


[8]  Cadzow,  J.  A.  and  Bronez,  T.  P.  (1983).  Time  series  identification:  An  annihila¬ 
tion  filter  approach.  Proc.  of  IEEE  1983  ASSP  Spectrum  Estimation  Workshop 
II,  pp.  172-180. 

[9]  Chan,  Y.  T.,  Lavoie,  J.  M.  M.,  and  Plant,  J.  B.  (1981).  A  parameter  estimation 
approach  to  estimation  of  frequencies  of  sinusoids.  IEEE  Trans.  Acoust.,  Speech, 
Signal  Process.,  vol.  29,  no.  2,  pp.  214-219. 

[10]  Chan,  Y.  T.  and  Langford,  R.  P.  (1982).  Spectral  estimation  via  the  high-order 
equations.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  30,  no.  5,  pp.  689- 
698. 

[11]  Chicharo,  J.  F.  and  Ng,  T.  S.  (1990).  Gradient-based  adaptive  HR  notch  filtering 
for  frequency  estimation.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  38, 
no.  5,  pp.  769-777. 

[12]  Dragosevic,  M.  V.  and  Stankovic,  S.  S.  (1989).  A  generalized  least  squares  method 
for  frequency  estimation.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  37, 
no.  6,  pp.  805-819. 

[13]  Dragosevic,  M.  V.,  Stankovic,  S.  S.,  and  Carapic,  M.  (1982).  An  approach  to 
recursive  estimation  of  time-varying  spectra.  Proc.  ICASSP82,  vol.  3,  pp.  2080- 
3083. 

[14]  Fuchs,  J.  (1988).  Estimating  the  number  of  sinusoids  in  additive  white  noise. 
IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  36,  no.  12,  pp.  1846-1853. 

[15]  Grenander,  U.  and  Rosenblatt,  M.  (1957).  Statistical  Analysis  of  Stationary  Time 
Series.  New  York:  Wiley. 

[16]  Hannan,  E.  J.  (1970).  Multiple  Time  Series.  New  York:  Wiley. 


133 


[17]  Hannan,  E.  J.  (1973).  The  estimation  of  frequency.  J.  Appl.  Prob .,  vol.  10,  pp. 
510-519. 

[18]  Hannan,  E.  J.  and  Quinn,  B.  G.  (1989).  The  resolution  of  closely  adjacent  spectral 
lines.  J.  Time  Series  Analysis,  vol.  10,  no.  1,  pp.  13-31. 

[19]  Haykin,  S.  (1986).  Adaptive  Filter  Theory.  Englewood  Cliffs,  New  Jersey: 
Prentice-Hall. 

[20]  He,  S.  and  Kedem,  B.  (1989).  Higher  order  crossings  of  an  almost  periodic  random 
sequences  in  noise.  IEEE  Trans.  Inform.  Theory,  vol.  35,  no.  3,  pp.  360-370. 

[21]  He,  S.  and  Kedem,  B.  (1990).  The  zero-crossing  rate  of  autoregressive  processes 
and  its  link  to  unit  roots.  J.  Time  Series  Analysis,  vol.  11,  pp.  201-213. 

[22]  Hildebrand,  F.  B.  (1956).  Introduction  to  Numerical  Analysis.  New  York: 
McGraw-Hill,  Chapter  9. 

[23]  Hua,  Y.  and  Sarkar,  T.  K.  (1988).  Perturbation  analysis  of  TK  method  for  har¬ 
monic  retrieval  problems.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  36, 
no.  2,  pp.  228-240. 

[24]  Hua,  Y.  and  Sarkar,  T.  K.  (1990).  Matrix  pencil  method  for  estimating  parame¬ 
ters  of  exponentially  damped/undamped  sinusoids  in  noise.  IEEE  Trans.  Acoust., 
Speech,  Signal  Process.,  vol.  38,  no.  5,  pp.  814-824. 

[25]  Hua,  Y.  and  Sarkar,  T.  K.  (1991).  On  SVD  for  estimating  generalized  eigenvalues 
of  singular  matrix  pencil  in  noise.  IEEE  Trans.  Signal  Process.,  vol.  39,  no.  4,  pp. 
892-900. 

[26]  Karlin,  S.  and  Taylor,  H.  M.  (1975).  A  First  Course  in  Stochastic  Processes.  New 
York:  Academic. 


134 


[27]  Kaveh,  M.  and  Barabell,  A.  (1986).  The  statistical  performance  of  the  MUSIC 
and  the  minimum-norm  algorithms  in  resolving  plane  waves  in  noise.  IEEE  Trans. 
Acoust.,  Speech,  Signal  Process.,  vol.  34,  no.  2,  pp.  331-341. 

[28]  Kay,  S.  M.  (1984).  Accurate  frequency  estimation  at  low  signal-to-noise  ratio. 
IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  32,  no.  3,  pp.  540-547. 

[29]  Kay,  S.  M.  (1988).  Modern  Spectral  Estimation:  Theory  and  Application,  Engle¬ 
wood  Cliffs,  New  Jersey:  Prentice-Hall. 

[30]  Kay,  S.  M.  and  Marple,  S.  L.  (1981).  Spectrum  analysis  -  a  modern  perspective. 
Proc.  of  IEEE,  vol.  69,  no.  11,  pp.  1380-1419. 

[31]  Kay,  S.  M.  and  Shaw,  A.  K.  (1988).  Frequency  estimation  by  principal  compo¬ 
nent  AR  spectral  estimation  method  without  eigendecomposition.  IEEE  Trans. 
Acoust.,  Speech,  Signal  Process.,  vol.  36,  no.  1,  pp.  95-101. 

[32]  Kedem,  B.  (1990).  Contraction  mappings  in  mixed  spectrum  estimation.  Pre¬ 
sented  at  Inst,  for  Mathematics  and  Its  Applications,  Univ.  of  Minnesoda,  Min¬ 
neapolis. 

[33]  Kedem,  B.  and  Li,  T.  H.  (1989).  Higher  order  crossings  from  a  parametric  family 
of  linear  filters.  Technical  Repart  TR-89-47,  Dept,  of  Math.,  Univ.  of  Maryland, 
College  Park. 

[34]  Kedem,  B.  and  Li,  T.  H.  (1992).  Monotone  gain,  first-order  autocorrelation,  and 
zero-crossing  rate.  Ann.  Statist.,  vol.  19,  no.  3,  pp.  1672-1676. 

[35]  Kedem,  B.  and  Yakowitz,  S.  (1992).  Practical  aspects  of  a  fast  algorithm  for 
frequency  detection.  Revised. 

[36]  Kumaresan,  R.  and  Feng,  Y.  (1991).  FIR  prefiltering  improves  Prony’s  method. 
IEEE  Trans.  Signal  Process .,  vol.  39,  no.  3,  pp.  736-741. 


135 


[37]  Kumaresan,  R.,  Scharf,  L.  L.,  and  Shaw,  A.  K.  (1986).  An  algorithm  for  pole-zero 
modeling  and  spectral  analysis.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol. 
34,  no.  3,  pp.  637-640. 

[38]  Kumaresan,  R.,  Tufts,  D.  W.,  and  Scharf,  L.  L.  (1984).  A  Prony  method  for  noisy 
data:  choosing  the  signal  components  and  selecting  the  order  in  exponential  signal 
models.  Proc.  IEEE,  vol.  72,  no.  2,  pp.  230-233. 

[39]  Kung,  S.  Y.,  Arun,  K.  S.,  and  Bashkar  Rao,  D.  V.  (1983).  State-space  and 
singular- value  decomposition-based  approximation  methods  for  the  harmonic  re¬ 
trieval  problem.  J.  Opt.  Soc.  Amer.,  vol.  73,  no.  12,  pp.  1799-1811. 

[40]  Lacoss,  R.  T.  (1971).  Data  adaptive  spectral  analysis  method.  Geophysics,  vol. 
36,  no.  4,  pp.  661-675. 

[41]  Lang,  S.  and  McClellan,  J.  (1980).  Frequency  estimation  with  maximum  entropy 
spectral  estimator.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  28,  pp. 
716-724. 

[42]  Lehmann,  E.  L.  (1983).  Theory  of  Point  Estimation.  New  York:  Wiley. 

[43]  Li,  T.  H.  (1991).  On  the  estimation  of  sinusoidal  signals  in  additive  noise.  Un¬ 
published  manuscript. 

[44]  Li,  T.  H.  and  Kedem,  B.  (1991).  Adaptive  frequency  tracking  by  zero-crossing 
counts.  Technical  Report  TR91-26,  Dept,  of  Math.,  Univ.  of  Maryland,  College 
Park. 

[45]  Li,  T.  H.  and  Kedem,  B.  (1992).  Strong  consistency  of  the  contraction  mapping 
method  for  frequency  estimation.  Technical  Report  TR92-22,  Systems  Research 
Center,  Univ.  of  Maryland,  College  Park. 


136 


[46]  Li,  T.  H.,  Kedem,  B.,  and  Yakowitz,  S.  (1992).  Asymptotic  normality  of  the 
contraction  mapping  estimator  for  frequency  estimation.  Technical  Report  TR92- 
21,  Systems  Research  Center,  Univ.  of  Maryland,  College  Park. 

[47]  Ljung,  L.  (1987).  System  Identification  —  Theory  for  the  User,  Englewood  Cliffs, 
New  Jersey:  Prentice-Hall. 

[48]  Lopes,  S.  (1991).  Spectral  analysis  in  frequency  modulated  models.  Ph.D.  disser¬ 
tation,  Dept,  of  Math.,  Univ.  of  Maryland,  College  Park. 

[49]  Mackisack,  M.  S.  and  Poskitt,  D.  S.  (1989).  Autoregressive  frequency  estimation. 
Biometrika,  vol.  76,  no.  3,  pp.  565-575. 

[50]  Mackisack,  M.  S.  and  Poskitt,  D.  S.  (1990).  Some  properties  of  autoregressive 
estimates  for  processes  with  mixed  spectra.  J.  Time  Series  Analysis,  vol.  11,  no. 
4,  pp.  325-337. 

[51]  Markushevich,  A.  I.  (1977).  Theory  of  Functions  of  a  Complex  Variable.  2nd  Ed. 
New  York:  Chelsea. 

[52]  Matausek,  M.  R.,  Stankovic,  S.  S.,  and  Radovic,  D.  V.  (1983).  Iterative  inverse 
filtering  approach  to  the  estimation  of  frequencies  of  noisy  sinusoids,”  IEEE  Trans. 
Acoust.,  Speech,  Signal  Process.,  vol.  31,  no.  6,  pp.  1456-1463. 

[53]  Ortega,  J.  M.  and  W.  C.  Rheinboldt  (1970).  Iterative  Solution  of  Nonlinear  Equa¬ 
tions  in  Several  Variables.  New  York:  Academic. 

[54]  Paliwal,  K.  K.  (1986).  Some  comments  about  the  iterative  filtering  algorithm  for 
spectral  estimation  of  sinusoids.  Signal  Process.,  vol.  10,  pp.  307-310. 

[55]  Pisarenko,  V.  F.  (1973).  The  retrieval  of  harmonics  from  a  covariance  function. 
Geophys.  J.  Roy.  Astronom.  Soc.,  vol.  33,  pp.  347-366. 


137 


[56]  Porat,  B.  and  Friedlander,  B.  (1988).  Analysis  of  the  asymptotic  relative  efficiency 
of  the  MUSIC  algorithm.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  36, 
no.  4,  pp.  532-544. 

[57]  Priestley,  M.  B.  (1981).  Spectral  Analysis  and  Time  Series ,  vols.  1  and  2.  New 
York:  Academic. 

[58]  Quinn,  B.  G.  and  Fernandes,  J.  M.  (1991).  A  fast  efficient  technique  for  the 
estimation  of  frequency.  Biometrika,  vol.  78,  no.  3,  pp.  489-497. 

[59]  Rao,  B.  D.  (1988).  Perturbation  analysis  of  an  SVD-based  linear  prediction  meth¬ 
ods  for  estimating  the  frequencies  of  multiple  sinusoids.  IEEE  Trans.  Acoust., 
Speech,  Signal  Process.,  vol.  36,  no.  7,  pp.  1026-1035. 

[60]  Rice,  J.  A.  and  Rosenblatt,  M.  (1988).  On  frequency  estimation.  Biometrika,  vol. 
75,  no.  3,  pp.  477-484. 

[61]  Rife,  D.  C.,  and  Boorstyn,  R.  R.  (1974).  Single-tone  parameter  estimation  from 
discrete-time  observations.  IEEE  Trans.  Inform.  Theory ,  vol.  20,  pp.  591-598. 

[62]  —  (1976).  Multiple  tone  parameter  estimation  from  discrete-time  observations. 
Bell  Syst.  Technical  J.,  vol.  55,  pp.  1389-1410. 

[63]  Roy,  R.  and  Kailath,  T.  (1989).  ESPRIT  -  estimation  of  signal  parameters  via 
rotation  invariance  techniques.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol. 
37,  no. 7  pp.  984-995. 

[64]  Satorius,  E.  H.  and  Zeidler,  J.  R.  (1978).  Maximum  entropy  spectral  analysis  of 
multiple  sinusoids  in  noise.  Geophysics,  vol.  43,  no.  6,  pp.  1111-1118. 

[65]  Stoer,  J.  and  Bulirsch,  R.  (1980).  Introduction  to  Numerical  Analysis.  New  York: 
Springer- Verlag. 


138 


[66]  Stoica,  P.,  Friedlander,  B.,  and  Soderstrom,  T.  (1987).  Asymptotic  bias  of  the 
high-order  autoregressive  estimates  of  sinusoidal  frequencies.  Circuits,  Systems, 
Signal  Processing ,  vol.  6,  no.  3,  pp.  287-298. 

[67]  Stoica,  P.,  Moses,  R.  L.,  Friedlander,  B.,  and  Soderstrom,  T.  (1989).  Maximum 
likelihood  estimation  of  the  parameters  of  multiple  sinusoids  from  noisy  measure¬ 
ments.  IEEE  Trans.  Acoust.,  Speech,  Signal  Process.,  vol.  37,  no.  3,  pp.  378-392. 

[68]  Stoica,  P.  and  Nehorai,  A.  (1989).  Statistical  analysis  of  two  nonlinear  least- 
squares  estimators  of  sine  wave  parameters  in  the  colored-noise  case.  Circuits 
Systems  Signal  Process.,  vol.  8,  no.l,  pp.  3-15. 

[69]  Truong- Van,  B.  (1990).  A  new  approach  to  frequency  analysis  with  amplified 
harmonics.  J.  R.  Statist.  Soc.,  series  B,  vol.  52,  pp.  347-366. 

[70]  Tufts,  D.  W.  and  Kumaresan,  R.  (1982).  Estimation  of  frequencies  of  multiple 
sinusoids:  Making  linear  prediction  performance  like  maximum  likelihood.  Proc. 
IEEE,  vol.  70,  no.  9,  pp.  975-989. 

[71]  Ulrych,  T.  J.  and  Clayton,  R.  W.  (1976).  Time  series  modeling  and  maximum 
entropy.  Phys.  Earth  Planet.  Interiors ,  vol.  12,  pp.  188-200. 

[72]  Walker,  A.  M.  (1971).  On  the  estimation  of  a  harmonic  component  in  a  time  series 
with  stationary  independent  residuals.  Biometrika,  vol.  58,  no.  1,  pp.  21-36. 

[73]  Walker,  A.  M.  (1973).  On  the  estimation  of  a  harmonic  component  in  a  time  series 
with  stationary  dependent  residuals.  Adv.  Appl.  Prob.,  vol.  5,  pp.  217-241. 

[74]  Whittle,  P.  (1952).  The  simultaneous  estimation  of  time  series’  harmonic  compo¬ 
nents  and  covariance  structure.  Trab.  Estad.,  vol.  3,  pp.  43-57. 

[75]  Yakowitz,  S.  (1991).  Some  contributions  to  a  frequency  location  method  due  to 
He  and  Kedem.  IEEE  Trans.  Inform.  Theory ,  vol.  37,  pp.  1177-1181. 


139 


[76]  Zoltawski,  M.  and  Stavrinides,  D.  (1989).  Sensor  array  signal  processing  via  a 
Procrustes  rotation  based  eigen-analysis  of  the  ESPRIT  data  pencil.  IEEE  Trans. 
Acoust.,  Speech,  Signal  Process .,  vol.  37,  no.  6,  pp.  832-961. 


140 


